Markerless 2D Analysis - Button B1_r1_c4
Overview
The Markerless 2D Analysis button (B1_r1_c4) in the vailá GUI provides access to advanced 2D pose estimation capabilities using state-of-the-art computer vision models. This module offers two processing modes: Standard (MediaPipe only) and Advanced (YOLOv11 + MediaPipe), allowing users to choose the optimal balance between speed and accuracy for their specific use case.
Button Location
- Grid Position: B1_r1_c4 (Button 1, Row 1, Column 4)
- GUI Category: Markerless Analysis
- Access Path: Main GUI → Markerless 2D Analysis
Available Versions
Version 1: Standard (MediaPipe Only - CPU)
- Script:
vaila/markerless_2d_analysis.py - Speed: Faster processing (CPU optimized)
- Use Case: Single-person scenarios, real-time applications
- Accuracy: High for single-person detection
- Features:
- MediaPipe Pose model (33 landmarks)
- Video resize functionality (2x-8x upscaling)
- Advanced filtering (Butterworth, Savitzky-Golay, LOWESS, Spline, Kalman, ARIMA)
- Batch processing with memory management
- CPU throttling for resource optimization
- TOML configuration support
- Bounding box (ROI) selection for small subjects
- Portable debug logging
Version 1 GPU: Standard (MediaPipe Only - NVIDIA GPU)
- Script:
vaila/markerless_2d_analysis_nvidia.py - Speed: Much faster processing (GPU accelerated)
- Use Case: Single-person scenarios, high-performance requirements
- Accuracy: High for single-person detection (same as CPU version)
- Features:
- All features from Version 1 (CPU)
- NVIDIA GPU acceleration via MediaPipe GPU delegate
- Automatic GPU detection and testing
- Device selection dialog (CPU/GPU choice at startup)
- GPU information display (name, driver, memory)
- Automatic fallback to CPU if GPU unavailable
- Requirements: NVIDIA GPU with CUDA support and drivers
Version 2: Advanced (YOLOv11 + MediaPipe)
- Script:
vaila/markerless2d_analysis_v2.py - Speed: Slower but more robust
- Use Case: Multi-person scenarios, complex environments
- Accuracy: Superior for multi-person and occluded scenarios
- Features:
- YOLOv11 person detection + MediaPipe pose estimation
- YOLO11-pose models (nano, small, medium, large, extra-large)
- YOLO-only mode (17 keypoints from YOLO11-pose)
- YOLO+MediaPipe hybrid mode
- GPU/CPU automatic detection
- Temporal filtering (Kalman, Savitzky-Golay)
- Enhanced multi-person tracking
Technical Specifications
Supported Input Formats
- Video Formats:
.mp4,.avi,.mov - Resolution: Any (automatic batch processing for high-res videos)
- Frame Rate: Any (automatically detected)
Output Files
For each processed video, the module generates:
- Annotated Video (
*_mp.mp4) - Original video with pose landmarks overlaid
- Green circles for landmarks
- Red lines for connections
-
Optional bounding boxes (YOLO mode)
-
Normalized Coordinates (
*_mp_norm.csv) - 33 landmarks (MediaPipe) or 17 keypoints (YOLO-only)
- Coordinates normalized to 0-1 scale
-
Format:
frame_index, landmark_x, landmark_y, landmark_z -
Pixel Coordinates (
*_mp_pixel.csv) - Coordinates in pixel format
- Original video resolution
-
Format:
frame_index, landmark_x_px, landmark_y_px, landmark_z -
Log File (
log_info.txt) - Processing metadata
- Hardware configuration
- Pipeline configuration (MediaPipe/YOLO)
- Detection statistics
-
Performance metrics
-
Configuration File (
configuration_used.toml) - All parameters used for processing
- Reusable for batch processing
Landmark Detection
MediaPipe Landmarks (33 points)
- Face: Nose, eyes (inner, center, outer), ears, mouth corners
- Upper Body: Shoulders, elbows, wrists, hands (pinky, index, thumb)
- Lower Body: Hips, knees, ankles, heels, feet
YOLO11-Pose Keypoints (17 points)
- Face: Nose, left/right eye, left/right ear
- Upper Body: Left/right shoulder, elbow, wrist
- Lower Body: Left/right hip, knee, ankle
Configuration Parameters
MediaPipe Settings
- min_detection_confidence (0.0-1.0): Threshold to start detecting poses
- min_tracking_confidence (0.0-1.0): Threshold to keep tracking poses
- model_complexity (0-2): 0=fastest, 1=balanced, 2=most accurate
- enable_segmentation (True/False): Draw person outline
- smooth_segmentation (True/False): Smooth the outline
- static_image_mode (True/False): Treat each frame separately
- apply_filtering (True/False): Apply built-in smoothing
- estimate_occluded (True/False): Guess hidden body parts
YOLO Settings (Version 2 only)
- use_yolo (True/False): Enable YOLO person detection
- yolo_mode:
yolo_only: Use only YOLO11-pose (17 keypoints)yolo_mediapipe: Use YOLO for detection + MediaPipe for pose (33 landmarks)- yolo_model: Model selection
yolo11n-pose.pt: Nano (fastest, smallest)yolo11s-pose.pt: Smallyolo11m-pose.pt: Mediumyolo11l-pose.pt: Largeyolo11x-pose.pt: Extra Large (most accurate)- yolo_conf (0.0-1.0): YOLO confidence threshold
Video Processing Settings
- enable_resize (True/False): Upscale video for better detection
- resize_scale (2-8): Scale factor (higher = better detection but slower)
- enable_padding (True/False): Add initial frames for stabilization
- pad_start_frames (0-120): Number of padding frames
Advanced Filtering (Version 1 only)
- enable_advanced_filtering (True/False): Apply smoothing and gap filling
- interp_method:
linear,cubic,nearest,kalman,none - smooth_method:
none,butterworth,savgol,lowess,kalman,splines,arima - max_gap (frames): Maximum gap size to fill
Temporal Filtering (Version 2 only)
- filter_type:
none,kalman,savgol
Performance Characteristics
Version 1 (Standard - CPU)
- Processing Speed: ~30-60 FPS (CPU, depends on resolution)
- Memory Usage: Moderate (batch processing for large videos)
- Best For: Single-person, high-quality videos
- Hardware: CPU optimized, Linux batch processing
Version 1 GPU (Standard - NVIDIA GPU)
- Processing Speed: ~60-150+ FPS (GPU, depends on GPU model and resolution)
- Memory Usage: Moderate (GPU memory optimized)
- Best For: Single-person, high-performance requirements, batch processing
- Hardware: NVIDIA GPU with CUDA support required
- Speedup: 2-5x faster than CPU version (depending on GPU)
Version 2 (Advanced)
- Processing Speed: ~10-30 FPS (CPU), ~60-120 FPS (GPU)
- Memory Usage: Higher (YOLO model loading)
- Best For: Multi-person, complex scenarios, occlusions
- Hardware: GPU recommended, CPU fallback available
Usage Workflow
Step 1: Launch Module
- Click Markerless 2D Analysis button (B1_r1_c4)
- Select version:
- 1: Standard (MediaPipe only - CPU)
- 1 GPU: Standard (MediaPipe only - NVIDIA GPU) - NEW!
- 2: Advanced (YOLOv11 + MediaPipe)
Step 2: Device Selection (Version 1 GPU only)
- Automatic GPU Detection: The script automatically detects NVIDIA GPU availability
- GPU Testing: MediaPipe GPU delegate is tested automatically
- Device Selection Dialog: Choose between:
- CPU: Standard processing (always available)
- GPU: NVIDIA CUDA acceleration (if available and tested)
- GPU Information Display: Shows GPU name, driver version, memory, and test status
- Automatic Fallback: If GPU test fails, CPU is used automatically
Step 3: Configure Parameters
- Select input directory containing videos
- Select output base directory
- Configure detection parameters via GUI or load TOML file
Step 4: Process Videos
- Module processes all videos in input directory
- Progress displayed in terminal
- GPU/CPU usage information shown (Version 1 GPU)
- Output files saved to timestamped directory
Step 5: Review Results
- Check annotated videos for quality
- Review CSV files for coordinate data
- Examine log files for statistics
Integration with vailá Ecosystem
Data Flow
Video Input → Pose Detection → Coordinate Extraction → CSV Export
↓
Annotated Video
↓
Integration with other modules:
- DLT Calibration (3D reconstruction)
- Visualization (2D/3D plotting)
- Machine Learning (model training)
- Multimodal Analysis (IMU, force plate, etc.)
Compatible Modules
- 3D Reconstruction: Use pixel coordinates with DLT calibration
- Visualization: Direct import to vailá plotting modules
- Data Processing: Compatible with filtering and interpolation tools
- Machine Learning: Training data for pose estimation models
Requirements
System Requirements
- Python: 3.12.12+
- OS: Linux, macOS, Windows
- RAM: 4GB minimum (8GB+ recommended for large videos)
- GPU:
- Version 1 GPU: NVIDIA GPU with CUDA support required
- Version 2: Optional but recommended (NVIDIA GPU with CUDA)
- Version 1 (CPU): No GPU required
GPU Requirements (Version 1 GPU)
- NVIDIA GPU: Any CUDA-capable NVIDIA GPU
- NVIDIA Drivers: Latest drivers installed
- CUDA Toolkit: Required for MediaPipe GPU delegate
- MediaPipe: Version 0.10.31+ with GPU delegate support
- Testing: Automatic GPU detection and MediaPipe delegate testing
Python Dependencies
# Core dependencies
opencv-python>=4.8.0
mediapipe>=0.10.0
numpy>=1.24.0
pandas>=2.0.0
# Version 1 additional
scipy>=1.10.0
scikit-learn>=1.3.0
statsmodels>=0.14.0
pykalman>=0.9.5
toml>=0.10.2
psutil>=5.9.0
rich>=13.0.0
# Version 2 additional
ultralytics>=8.0.0
torch>=2.0.0
Troubleshooting
Common Issues
- Memory Errors
- Solution: Use batch processing (automatic on Linux)
-
Alternative: Reduce video resolution or disable resize
-
Slow Processing
- Solution: Use lower model complexity or smaller YOLO model
-
Alternative:
- Use Version 1 GPU (NVIDIA GPU) for 2-5x speedup
- Enable GPU acceleration (Version 2)
-
Poor Detection
- Solution: Enable video resize (2x-4x)
- Alternative: Adjust confidence thresholds
-
For multi-person: Use Version 2 (Advanced)
-
Missing Landmarks
- Solution: Enable occlusion estimation
- Alternative: Use advanced filtering to fill gaps
Performance Optimization Tips
- Single-person videos:
- Use Version 1 GPU (NVIDIA GPU) for best performance
- Use Version 1 (CPU) if no GPU available
- Multi-person videos: Use Version 2 (Advanced)
- High-resolution videos: Enable batch processing (automatic)
- Low-quality videos: Enable resize (2x-4x)
- Real-time applications: Use Version 1 GPU with model_complexity=0
- GPU Acceleration:
- Version 1 GPU provides 2-5x speedup over CPU
- Automatic GPU detection and testing ensures compatibility
- Fallback to CPU if GPU unavailable or test fails
Version History
Version 0.7.1 (Current - Standard CPU)
- Added batch processing for Linux
- Improved memory management
- Enhanced filtering options
- TOML configuration support
- Bounding box (ROI) selection for small subjects
- Portable debug logging
Version 0.7.1 (Current - Standard GPU) - NEW!
- NVIDIA GPU acceleration via MediaPipe GPU delegate
- Automatic GPU detection using nvidia-smi
- MediaPipe GPU delegate testing before use
- Device selection dialog for CPU/GPU choice
- GPU information display (name, driver, memory)
- Automatic fallback to CPU if GPU unavailable
- All features from CPU version
- 2-5x performance improvement over CPU version
Version 0.0.2 (Current - Advanced)
- YOLO11-pose integration
- Multi-person detection
- GPU/CPU automatic detection
- YOLO-only mode support
- Enhanced temporal filtering
References
- MediaPipe: Google MediaPipe Pose
- YOLOv11: Ultralytics YOLOv11
- vailá Repository: GitHub
Support
For issues, questions, or contributions: - Email: paulosantiago@usp.br - GitHub Issues: vaila-multimodaltoolbox/vaila/issues - Documentation: vailá Documentation
Last Updated: November 2025
Maintained by: Paulo Roberto Pereira Santiago
License: AGPLv3.0