# PixelToVoxelProjector Benchmark Suite Comprehensive performance benchmarking suite for the PixelToVoxelProjector system. ## Overview This benchmark suite provides detailed performance analysis across all major components: - **Main Benchmark Suite** (`benchmark_suite.py`) - End-to-end pipeline benchmarking - **Camera Benchmarks** (`camera_benchmark.py`) - 8K video processing performance - **Voxel Benchmarks** (`voxel_benchmark.cu`) - CUDA kernel performance - **Network Benchmarks** (`network_benchmark.py`) - Streaming performance ## Requirements ### Python Dependencies ```bash pip install -r requirements.txt ``` ### CUDA Requirements (for voxel_benchmark.cu) - NVIDIA GPU with CUDA support - CUDA Toolkit 11.0 or later - nvcc compiler ## Installation 1. Install Python dependencies: ```bash cd /home/user/Pixeltovoxelprojector/tests/benchmarks pip install -r requirements.txt ``` 2. Compile CUDA benchmarks: ```bash make voxel_benchmark ``` ## Usage ### Run All Benchmarks ```bash python run_all_benchmarks.py ``` ### Run Individual Benchmarks **Main Benchmark Suite:** ```bash python benchmark_suite.py ``` **Camera Pipeline Benchmarks:** ```bash python camera_benchmark.py ``` **CUDA Voxel Benchmarks:** ```bash ./voxel_benchmark ``` **Network Benchmarks:** ```bash python network_benchmark.py ``` ## Benchmark Details ### 1. Main Benchmark Suite **Tests:** - Voxel ray casting performance - Motion detection (8K frames) - Voxel grid update throughput - End-to-end pipeline latency **Metrics:** - Throughput (FPS) - Latency percentiles (p50, p95, p99) - CPU/GPU utilization - Memory usage **Output:** - JSON results file - CSV summary - HTML report with graphs - Performance baseline for regression detection ### 2. Camera Benchmarks **Tests:** - 8K video decode performance - Motion extraction at multiple resolutions - Multi-camera synchronization (8 cameras) - Frame drop detection and analysis - End-to-end camera pipeline **Metrics:** - Decode FPS and latency - Motion detection throughput - Synchronization accuracy - Packet loss rates **Output:** - JSON results in `benchmark_results/camera/` ### 3. CUDA Voxel Benchmarks **Tests:** - Ray casting with DDA algorithm - Atomic voxel updates - Memory bandwidth (coalesced access) - Voxel reduction operations **Metrics:** - Kernel execution time - Throughput (GOPS) - Memory bandwidth (GB/s) - Grid size scalability **Output:** - Console output with detailed metrics - Kernel configuration (blocks, threads) ### 4. Network Benchmarks **Tests:** - TCP throughput - UDP throughput with packet loss tracking - Latency measurement (ping-pong) - Multi-client scalability - Streaming latency (simulating voxel data) **Metrics:** - Throughput (Mbps) - Latency (avg, p95, p99) - Packet loss percentage - Jitter - Multi-client aggregate throughput **Output:** - JSON results in `benchmark_results/network/` ## Performance Baselines The benchmark suite supports performance regression detection: 1. Run initial benchmarks to establish baseline: ```bash python benchmark_suite.py # When prompted, save as baseline: y ``` 2. Future runs will compare against baseline and report regressions 3. Baselines are stored in: `benchmark_results/baselines.json` ## Interpreting Results ### Throughput - Higher is better - Target: >30 FPS for real-time processing - 8K decode: 30-60 FPS typical - Motion detection: 50-100 FPS typical ### Latency - Lower is better - Target p99 latency: <33ms (for 30 FPS) - p50 should be <10ms for interactive performance ### GPU Utilization - 70-95% indicates good GPU usage - <50% may indicate CPU bottleneck - >98% may indicate over-saturation ### Memory Bandwidth - Modern GPUs: 300-900 GB/s theoretical - Actual: 60-80% of theoretical is good - <50% indicates inefficient memory access patterns ### Packet Loss - TCP: Should be 0% - UDP: <1% acceptable for real-time - >5% indicates network issues ## Example Output ``` ======================================== Benchmark: Voxel Ray Casting (500^3) ======================================== Duration: 2450.32 ms Throughput: 40.81 FPS Latency (p50): 23.12 ms Latency (p95): 28.45 ms Latency (p99): 31.67 ms CPU Util: 45.2% Memory: 1234.56 MB GPU Util: 87.3% GPU Memory: 2345.67 MB No performance regressions detected. ``` ## Troubleshooting ### GPU Not Detected If CUDA benchmarks fail to find GPU: ```bash nvidia-smi # Check GPU is visible nvcc --version # Check CUDA toolkit installed ``` ### Python Benchmarks Slow 1. Ensure OpenCV is using optimized build: ```bash python -c "import cv2; print(cv2.getBuildInformation())" ``` 2. Check for CPU-only operations (should use GPU when available) ### Network Benchmarks Show High Latency When testing on localhost (127.0.0.1): - Latency will be very low (< 1ms typical) - For realistic results, test between separate machines - Firewall rules may affect results ## Customization ### Adjust Test Parameters Edit the benchmark scripts to modify: - Grid sizes - Number of iterations - Test duration - Resolution settings Example: ```python suite.run_benchmark( "Custom Test", benchmark_function, iterations=200, # Increase for more accuracy warmup=20, # More warmup iterations grid_size=1000 # Larger grid ) ``` ### Add Custom Benchmarks 1. Create benchmark function: ```python def my_custom_benchmark(param1, param2): # Your code here pass ``` 2. Add to suite: ```python suite.run_benchmark( "My Custom Test", my_custom_benchmark, iterations=100, param1=value1, param2=value2 ) ``` ## CI/CD Integration For automated performance testing: ```bash # Run benchmarks and exit with error on regression python benchmark_suite.py --check-regression --exit-on-failure ``` ## Performance Optimization Tips Based on benchmark results: 1. **Low GPU Utilization**: Increase batch size or parallelize more work 2. **High CPU Utilization**: Move more work to GPU 3. **High Memory Usage**: Optimize data structures or streaming 4. **High Latency**: Check for synchronization points or blocking operations 5. **Low Throughput**: Profile to find bottlenecks ## Contributing When adding new benchmarks: 1. Follow existing structure 2. Include warmup iterations 3. Report multiple metrics (throughput, latency, utilization) 4. Add documentation 5. Include baseline values ## License Same as parent project.