Markerless 3D Analysis
Overview
The Markerless 3D Analysis module provides advanced 3D pose estimation from monocular video using a sophisticated pipeline that combines 2D pose detection, 3D lifting, ground plane anchoring, and optional DLT3D refinement.
Pipeline Overview
The module implements a complete 3D reconstruction pipeline:
- 2D Pose Detection: Extract 2D landmarks using MediaPipe
- 3D Lifting: Use VideoPose3D to lift 2D poses to 3D
- Ground Anchoring: Anchor poses to ground plane using DLT2D calibration
- Scale Calibration: Calibrate vertical scale using participant leg length
- Optional DLT3D Refinement: Further refine 3D coordinates using multi-camera calibration
Key Features
- Monocular 3D Reconstruction: Generate 3D pose data from single camera videos
- Batch Processing: Process multiple videos or CSV files simultaneously
- Multiple Input Formats: Support for video files and pre-extracted 2D CSV data
- Flexible Calibration: Support for DLT2D ground plane calibration and DLT3D multi-camera calibration
- TOML Configuration: Comprehensive configuration management with TOML files
- GUI Interface: User-friendly graphical interface for parameter configuration
- C3D Export: Export results in C3D format compatible with motion capture software
Supported Pose Models
- COCO17 Format: 17 keypoints (nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles)
- MediaPipe Mapping: Automatic conversion from MediaPipe's 33 landmarks to COCO17 format
Configuration Parameters
Input/Output Settings
- Input Directory: Directory containing video files or CSV data
- Output Directory: Directory for processed results
- File Patterns: Support for common video formats and CSV files
- Units: Metric (meters) or imperial (inches) units
Calibration Settings
- DLT2D Path: Path to 2D camera calibration file for ground plane anchoring
- DLT3D Path: Optional path to 3D multi-camera calibration file
- Leg Length: Participant leg length for vertical scale calibration (default: 0.42m)
Processing Options
- VideoPose3D Model: Choose from different pretrained models
- Batch Size: Processing batch size for VideoPose3D inference
- Temporal Smoothing: Apply temporal filtering to 3D coordinates
- Ground Anchoring: Enable/disable ground plane anchoring
- DLT3D Refinement: Enable/disable multi-camera refinement
Output Files
For each processed input, the module generates:
- 3D Coordinates CSV (
*_3d.csv): 3D pose coordinates in meters - C3D File (
*_3d.c3d): C3D format file compatible with motion capture software - Processing Log (
*_log.txt): Detailed processing information and statistics - Configuration File (
config.toml): Complete configuration used for processing
Usage
GUI Mode (Recommended)
from vaila.markerless_3d_analysis import process_videos_in_directory
# Launch GUI for configuration and processing
process_videos_in_directory()
Programmatic Usage
from vaila.markerless_3d_analysis import run_single_video
# Define configuration
config = {
'input_dir': '/path/to/videos',
'output_dir': '/path/to/output',
'dlt2d_path': '/path/to/calibration.dlt2d',
'dlt3d_path': '/path/to/calibration3d.dlt3d',
'leg_length_m': 0.42,
'units': 'm',
'batch_size': 16,
'temporal_smoothing': True
}
# Process single video
run_single_video(config, 'input_video.mp4', '/path/to/output')
Batch Processing
import glob
from vaila.markerless_3d_analysis import run_single_video
# Process all videos in directory
video_files = glob.glob('/path/to/videos/*.mp4')
for video_path in video_files:
run_single_video(config, video_path, '/path/to/output')
Calibration Setup
DLT2D Ground Plane Calibration
- Capture Calibration Video: Record a video of a known calibration pattern
- Extract 2D Points: Use the DLT2D calibration tool to extract 2D coordinates
- Generate DLT2D File: Create the calibration file using the DLT2D module
DLT3D Multi-Camera Calibration (Optional)
- Multi-Camera Setup: Set up multiple synchronized cameras
- Calibration Pattern: Use a known 3D calibration object
- Extract 2D/3D Points: Extract corresponding 2D and 3D coordinates
- Generate DLT3D File: Create the calibration file using the DLT3D module
Requirements
Core Dependencies
- Python 3.11+
- NumPy (
numpy) - OpenCV (
opencv-python) - MediaPipe (
mediapipe) - SciPy (
scipy)
3D Lifting (VideoPose3D)
- PyTorch (
torch) - VideoPose3D model files (automatically downloaded)
C3D Export
- ezc3d (
ezc3d)
Configuration
- TOML support (
tomlifor Python < 3.11, built-in for Python 3.11+)
Performance Considerations
Processing Speed
- 2D Extraction: ~30-60 FPS depending on video resolution
- 3D Lifting: ~10-20 FPS with GPU acceleration
- Calibration: ~1-5 seconds per video
Memory Usage
- Batch Processing: Adjust batch size based on available RAM
- Large Videos: Consider processing in segments for very long videos
Hardware Recommendations
- GPU: Recommended for VideoPose3D inference (significant speedup)
- RAM: 8GB+ recommended for batch processing
- Storage: ~10x input video size for output files
Accuracy Considerations
Sources of Error
- Camera Calibration: Inaccurate calibration affects absolute accuracy
- Leg Length Estimation: Incorrect leg length affects vertical scale
- Occlusion: Partial occlusions reduce tracking accuracy
- Lighting: Poor lighting conditions affect 2D detection
Improving Accuracy
- Use high-quality camera calibration
- Ensure proper leg length measurement
- Optimize lighting conditions
- Use multiple camera views when possible
Integration with vailá Ecosystem
This module integrates with other vailá tools:
- Motion Capture: Use with cluster analysis or full-body mocap tools
- Visualization: Compatible with 3D plotting and C3D viewing tools
- Data Processing: Output can be processed with filtering and smoothing tools
- Machine Learning: 3D coordinates can be used for ML model training
Troubleshooting
Common Issues
- VideoPose3D Model Download: Ensure internet connection for automatic model download
- Calibration File Errors: Verify calibration file format and parameters
- Memory Errors: Reduce batch size or process videos individually
- GPU Issues: Ensure CUDA compatibility or use CPU-only mode
Performance Optimization
- Use GPU acceleration when available
- Process videos in smaller batches
- Disable unnecessary processing steps for quick results
- Use pre-extracted 2D CSV data when available
Version History
- v0.6.0: Added DLT3D refinement and improved calibration options
- v0.5.0: Added VideoPose3D integration and ground anchoring
- v0.4.0: Added batch processing and TOML configuration
- v0.3.0: Added GUI interface and CSV input support
- v0.2.0: Initial implementation with basic 3D lifting
- v0.1.0: Proof of concept with simple monocular reconstruction