# Deployment Guide ## Overview This guide provides comprehensive instructions for deploying the 8K Motion Tracking and Voxel Processing System in production environments, including hardware setup, network configuration, camera calibration, performance tuning, and troubleshooting. --- ## Table of Contents 1. [Hardware Requirements](#hardware-requirements) 2. [Network Configuration](#network-configuration) 3. [Camera Setup and Calibration](#camera-setup-and-calibration) 4. [Software Installation](#software-installation) 5. [System Configuration](#system-configuration) 6. [Performance Tuning](#performance-tuning) 7. [Monitoring and Maintenance](#monitoring-and-maintenance) 8. [Troubleshooting Guide](#troubleshooting-guide) 9. [Security Considerations](#security-considerations) 10. [Backup and Recovery](#backup-and-recovery) --- ## Hardware Requirements ### Minimum Configuration (Development/Testing) #### Single Node Setup - **CPU**: Intel Core i7-10700K or AMD Ryzen 7 3700X (8 cores) - **RAM**: 32GB DDR4-3200 - **GPU**: NVIDIA RTX 3060 (12GB VRAM) - **Storage**: 500GB NVMe SSD - **Network**: 1GbE network adapter - **Operating System**: Ubuntu 20.04 LTS or later **Capacity**: 2 camera pairs (4 cameras) ### Recommended Configuration (Production) #### Multi-Node Cluster **Master Node**: - **CPU**: Intel Core i9-13900K or AMD Ryzen 9 7950X (16+ cores) - **RAM**: 64GB DDR5-5600 - **GPU**: NVIDIA RTX 4090 (24GB VRAM) x1 - **Storage**: 2TB NVMe SSD (for OS and application) - **Storage**: 8TB HDD (for data archival) - **Network**: 10GbE or InfiniBand (100 Gbps) - **Operating System**: Ubuntu 22.04 LTS **Worker Nodes** (4+ nodes): - **CPU**: Intel Core i9-13900K or AMD Ryzen 9 7950X - **RAM**: 64GB DDR5-5600 - **GPU**: NVIDIA RTX 4090 (24GB VRAM) x2-4 - **Storage**: 2TB NVMe SSD - **Network**: 10GbE or InfiniBand - **Operating System**: Ubuntu 22.04 LTS **Capacity**: 10 camera pairs (20 cameras) per cluster ### Camera Hardware **Per Camera Pair**: - **Monochrome Camera**: 8K (7680x4320) GigE Vision camera - Frame rate: 30 FPS minimum - Sensor: Monochrome CMOS (1" or larger) - Interface: GigE Vision (10GbE preferred) - Triggering: Hardware trigger support - Example: FLIR Blackfly S BFS-U3-200S6M-C - **Thermal Camera**: 8K thermal imaging camera - Frame rate: 30 FPS minimum - Sensor: Uncooled microbolometer - Interface: GigE Vision (10GbE preferred) - Triggering: Hardware trigger support - Temperature range: -20°C to 150°C - Example: FLIR A700 series **Camera Mounting**: - Rigid mounting bracket for camera pair - Stereo baseline: 0.3-0.8 meters (adjustable) - Vibration isolation mounts - Weather-resistant enclosure (for outdoor deployment) ### Network Infrastructure **Network Switches**: - 10GbE managed switch with 24+ ports - Jumbo frame support (MTU 9000) - VLAN support - QoS/traffic shaping - Low latency (<1 microsecond) - Example: Cisco Nexus 3000 series, Arista 7050 series **Network Cables**: - Cat6A or Cat7 Ethernet cables for 10GbE - Maximum cable length: 100 meters for Cat6A - Proper cable management and labeling **Optional: InfiniBand**: - Mellanox ConnectX-5 or newer adapters (100 Gbps) - InfiniBand switches (e.g., Mellanox SN2700) - QSFP28 cables ### Power Requirements **Per Worker Node**: - Power supply: 1200W+ 80 Plus Platinum - UPS backup: 2000VA for critical nodes - Power consumption: ~600-800W under load **Total Cluster** (Master + 4 Workers): - Estimated power: 3-4 kW - Recommended UPS: 6000VA+ rack-mount UPS - Redundant power supplies recommended ### Rack and Cooling **Rack Requirements**: - Standard 19" server rack - 42U minimum height - Proper cable management - KVM switch for management **Cooling**: - Target ambient temperature: 20-22°C - Airflow: Front-to-back - Minimum 1000 CFM for 5-node cluster - Hot aisle/cold aisle configuration recommended --- ## Network Configuration ### Network Topology #### Option 1: Flat Network (Simple, <10 nodes) ``` ┌─────────────────────────────────────────────────────────┐ │ 10GbE Switch │ │ │ │ Port 1-10: Cameras (192.168.1.10-19) │ │ Port 11: Master Node (192.168.1.100) │ │ Port 12-15: Worker Nodes (192.168.1.101-104) │ │ Port 20: Management/Internet Gateway │ └─────────────────────────────────────────────────────────┘ ``` #### Option 2: Segmented Network (Production, 10+ nodes) ``` ┌────────────────────────────────────────────────────────────┐ │ Core Switch (10GbE) │ └───────┬──────────────────┬──────────────────┬─────────────┘ │ │ │ ┌───────▼──────────┐ ┌─────▼──────────┐ ┌───▼──────────────┐ │ Camera Network │ │ Cluster Network│ │ Management Net │ │ VLAN 10 │ │ VLAN 20 │ │ VLAN 99 │ │ 192.168.10.0/24 │ │ 10.0.0.0/24 │ │ 192.168.99.0/24 │ │ │ │ │ │ │ │ Cameras 1-20 │ │ Master + Workers│ │ Admin Access │ └──────────────────┘ └────────────────┘ └──────────────────┘ ``` ### IP Address Assignment **Camera Network** (192.168.10.0/24): ``` 192.168.10.10-29 Monochrome cameras 0-9 (pairs 0-9) 192.168.10.30-49 Thermal cameras 0-9 (pairs 0-9) 192.168.10.1 Gateway/DHCP server ``` **Cluster Network** (10.0.0.0/24): ``` 10.0.0.1 Master node 10.0.0.10-13 Worker nodes 0-3 10.0.0.100-103 Worker node IB/RDMA interfaces ``` **Management Network** (192.168.99.0/24): ``` 192.168.99.1 Core switch management 192.168.99.10 Master node management 192.168.99.11-14 Worker node management interfaces ``` ### Network Settings #### MTU Configuration Enable jumbo frames for better throughput: ```bash # On all nodes and switches sudo ip link set eth0 mtu 9000 # Make persistent (Ubuntu) cat <> ~/.bashrc export CUDA_HOME=/usr/local/cuda-12.0 export PATH=\$CUDA_HOME/bin:\$PATH export LD_LIBRARY_PATH=\$CUDA_HOME/lib64:\$LD_LIBRARY_PATH EOF source ~/.bashrc # 5. Verify installation nvcc --version nvidia-smi ``` ### Python and Dependencies ```bash # 1. Install Python 3.10 sudo apt-get install -y \ python3.10 \ python3.10-dev \ python3-pip # 2. Create virtual environment python3.10 -m venv /opt/motion-tracking/venv source /opt/motion-tracking/venv/bin/activate # 3. Install Python packages pip install --upgrade pip setuptools wheel pip install \ numpy==1.24.3 \ opencv-python==4.7.0 \ pybind11==2.10.4 \ protobuf==4.22.3 \ grpcio==1.53.0 \ grpcio-tools==1.53.0 \ pyzmq==25.0.2 \ scipy==1.10.1 \ scikit-image==0.20.0 ``` ### Application Installation ```bash # 1. Clone repository cd /opt sudo git clone motion-tracking cd motion-tracking # 2. Set ownership sudo chown -R $USER:$USER /opt/motion-tracking # 3. Build C++ extensions python setup.py build_ext --inplace # 4. Verify build python verify_tracking_system.py # 5. Install systemd service sudo cp scripts/motion-tracking.service /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl enable motion-tracking ``` ### RDMA/InfiniBand Setup (Optional) ```bash # 1. Install RDMA packages sudo apt-get install -y \ rdma-core \ libibverbs1 \ librdmacm1 \ ibverbs-utils \ infiniband-diags # 2. Load kernel modules sudo modprobe ib_uverbs sudo modprobe rdma_ucm sudo modprobe mlx5_core sudo modprobe mlx5_ib # 3. Verify RDMA devices ibv_devices # Should list Mellanox adapters # 4. Test RDMA performance ib_write_bw -d mlx5_0 # 5. Configure persistent modules cat < prometheus.yml global: scrape_interval: 5s scrape_configs: - job_name: 'motion-tracking' static_configs: - targets: ['localhost:9090'] - job_name: 'node' static_configs: - targets: ['localhost:9100'] EOF # 3. Start Prometheus ./prometheus --config.file=prometheus.yml # 4. Access UI: http://localhost:9090 ``` **Install Grafana**: ```bash # 1. Install Grafana sudo apt-get install -y software-properties-common sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main" wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add - sudo apt-get update sudo apt-get install grafana # 2. Start Grafana sudo systemctl start grafana-server sudo systemctl enable grafana-server # 3. Access UI: http://localhost:3000 (admin/admin) ``` ### Health Checks **Create health check script** (`/opt/motion-tracking/scripts/health_check.sh`): ```bash #!/bin/bash # Check if service is running if ! systemctl is-active --quiet motion-tracking; then echo "ERROR: Service not running" exit 1 fi # Check GPU availability if ! nvidia-smi > /dev/null 2>&1; then echo "ERROR: GPU not accessible" exit 1 fi # Check camera connectivity python3 <> /var/log/motion-tracking/health_check.log 2>&1 ``` ### Log Rotation ```bash # Configure logrotate sudo cat < /etc/logrotate.d/motion-tracking /var/log/motion-tracking/*.log { daily rotate 30 compress delaycompress missingok notifempty create 0640 motion-tracking motion-tracking sharedscripts postrotate systemctl reload motion-tracking > /dev/null 2>&1 || true endscript } EOF ``` --- ## Troubleshooting Guide ### Common Issues #### Issue 1: Camera Connection Failures **Symptoms**: - Cameras not detected - Connection timeouts - Packet loss **Diagnosis**: ```bash # Check network connectivity ping 192.168.10.10 # Check camera discovery arv-tool-0.8 -l # Check packet loss ping -c 100 -s 8000 192.168.10.10 ``` **Solutions**: ```bash # 1. Check network settings sudo ethtool eth0 # 2. Increase network buffers (see Network Configuration section) # 3. Verify jumbo frames ip link show eth0 | grep mtu # 4. Check firewall sudo ufw status # 5. Restart network sudo systemctl restart networking ``` #### Issue 2: Low FPS / High Latency **Symptoms**: - Processing FPS < 30 - Latency > 33ms - GPU utilization low **Diagnosis**: ```bash # Check GPU usage nvidia-smi dmon -s u # Check CPU usage htop # Check application metrics curl http://localhost:9090/metrics ``` **Solutions**: ```bash # 1. Enable hardware acceleration # Edit /opt/motion-tracking/config/system.yaml # Set use_hardware_accel: true # 2. Reduce buffer size # Set buffer_size: 30 (from 60) # 3. Check GPU clocks nvidia-smi -q -d CLOCK # 4. Increase processing threads # Edit config, set num_processing_threads: 16 # 5. Profile application python -m cProfile -o profile.stats main.py ``` #### Issue 3: Memory Issues **Symptoms**: - Out of memory errors - High swap usage - System instability **Diagnosis**: ```bash # Check memory usage free -h # Check GPU memory nvidia-smi # Check process memory ps aux --sort=-%mem | head -20 ``` **Solutions**: ```bash # 1. Reduce buffer sizes # Edit config: buffer_size: 30 # 2. Enable sparse voxel storage # Edit config: enable_sparse_storage: true # 3. Reduce number of camera pairs # Edit config: num_pairs: 5 # 4. Add swap space sudo dd if=/dev/zero of=/swapfile bs=1G count=32 sudo mkswap /swapfile sudo swapon /swapfile ``` #### Issue 4: Calibration Errors **Symptoms**: - High reprojection errors - Poor stereo matching - Fusion alignment issues **Diagnosis**: ```bash # Validate calibration python scripts/validate_calibration.py # Check calibration files cat /opt/calibration/camera0_intrinsic.json ``` **Solutions**: ```bash # 1. Recalibrate cameras python scripts/calibrate_camera.py ... # 2. Use more calibration images (50+) # 3. Ensure good lighting conditions # 4. Use larger calibration target # 5. Check camera mounting rigidity ``` #### Issue 5: Network Issues **Symptoms**: - Node disconnections - High latency between nodes - Data loss **Diagnosis**: ```bash # Check network latency ping -c 100 10.0.0.10 # Check network bandwidth iperf3 -c 10.0.0.10 -t 30 # Check network errors netstat -i ``` **Solutions**: ```bash # 1. Enable RDMA (if available) # Edit config: enable_rdma: true # 2. Adjust TCP settings (see Network Configuration) # 3. Check switch configuration # 4. Reduce network traffic on VLAN # 5. Use dedicated network for cluster ``` ### Diagnostic Commands ```bash # System health /opt/motion-tracking/scripts/health_check.sh # GPU status nvidia-smi # Service status sudo systemctl status motion-tracking # View logs sudo journalctl -u motion-tracking -f # Check metrics curl http://localhost:9090/metrics # Network diagnostics sudo netstat -tuln | grep LISTEN sudo ss -s # Process information ps aux | grep motion-tracking top -p $(pgrep -f motion-tracking) ``` --- ## Security Considerations ### Network Security ```bash # 1. Segment networks with VLANs # 2. Enable firewall sudo ufw enable # 3. Restrict SSH access sudo ufw limit ssh sudo ufw allow from 192.168.99.0/24 to any port 22 # 4. Disable unused services sudo systemctl disable bluetooth sudo systemctl disable cups ``` ### Application Security ```bash # 1. Run as non-root user sudo useradd -r -s /bin/false motion-tracking # 2. Set file permissions sudo chmod 700 /opt/motion-tracking/config sudo chmod 600 /opt/motion-tracking/config/*.yaml # 3. Encrypt calibration data gpg --encrypt /opt/calibration/camera0_intrinsic.json # 4. Enable audit logging sudo apt-get install auditd sudo auditctl -w /opt/motion-tracking -p wa ``` --- ## Backup and Recovery ### Backup Strategy ```bash # 1. Backup configuration tar czf config-$(date +%Y%m%d).tar.gz /opt/motion-tracking/config # 2. Backup calibration tar czf calibration-$(date +%Y%m%d).tar.gz /opt/calibration # 3. Backup data rsync -av /var/lib/motion-tracking /backup/ # 4. Automated backups cat < /opt/motion-tracking/scripts/backup.sh #!/bin/bash DATE=\$(date +%Y%m%d) tar czf /backup/config-\$DATE.tar.gz /opt/motion-tracking/config tar czf /backup/calibration-\$DATE.tar.gz /opt/calibration find /backup -name "*.tar.gz" -mtime +30 -delete EOF chmod +x /opt/motion-tracking/scripts/backup.sh # Add to crontab: Daily at 2 AM 0 2 * * * /opt/motion-tracking/scripts/backup.sh ``` ### Recovery Procedures ```bash # 1. Restore configuration tar xzf config-20240101.tar.gz -C / # 2. Restore calibration tar xzf calibration-20240101.tar.gz -C / # 3. Rebuild application cd /opt/motion-tracking python setup.py build_ext --inplace # 4. Restart service sudo systemctl restart motion-tracking ``` --- ## Deployment Checklist - [ ] Hardware installed and powered on - [ ] Network configured (IP addresses, MTU, VLANs) - [ ] Cameras mounted and connected - [ ] CUDA and drivers installed - [ ] Application installed and built - [ ] Configuration files created - [ ] Cameras calibrated (intrinsic + extrinsic) - [ ] Stereo pairs calibrated - [ ] System service configured and enabled - [ ] Firewall configured - [ ] Monitoring setup (Prometheus + Grafana) - [ ] Log rotation configured - [ ] Health checks configured - [ ] Backup system configured - [ ] Documentation reviewed - [ ] Performance testing completed - [ ] Security audit completed --- ## Support and Maintenance ### Regular Maintenance Tasks **Daily**: - Monitor system health dashboard - Check logs for errors - Verify camera connectivity **Weekly**: - Review performance metrics - Check disk space - Verify backups **Monthly**: - Update system packages - Review and optimize configuration - Test backup restoration - Recalibrate cameras (if needed) **Quarterly**: - Hardware inspection - Cable management review - Security audit - Performance benchmarking ### Contact Information For deployment support, please contact: - Technical Support: support@example.com - Emergency Hotline: +1-XXX-XXX-XXXX - Documentation: https://docs.example.com --- ## Appendix ### A. Network Bandwidth Calculation For 10 camera pairs at 8K resolution: ``` Frame size: 7680 x 4320 x 1 byte = 33.2 MB Frame rate: 30 FPS Cameras: 20 Bandwidth per camera: 33.2 MB × 30 FPS = 996 MB/s = 7.97 Gbps Total bandwidth: 7.97 Gbps × 20 = 159.4 Gbps Recommendation: 10GbE per 2 cameras, or 100 Gbps InfiniBand for full system ``` ### B. Storage Requirements ``` Recording rate: 33.2 MB/frame × 30 FPS × 20 cameras = 19.9 GB/s Per hour: 71.6 TB Per day: 1.72 PB Recommendation: - Real-time processing without storage - Record detections only (~100 MB/hour) - Archive calibration data (~10 GB) ``` ### C. Power Consumption ``` Master node: 600W Worker nodes: 700W × 4 = 2800W Cameras: 30W × 20 = 600W Network: 200W Total: 4200W UPS recommendation: 6000VA (1 hour runtime) ```