Video Signals¶
Video deepfake detection analyzes temporal patterns, biological signals, and audio-visual synchronization to identify synthetic content.
Overview¶
Veridex provides three specialized video detection signals:
- RPPGSignal: Detects biological heartbeat patterns in facial video
- I3DSignal: Analyzes spatiotemporal motion features
- LipSyncSignal: Checks audio-visual synchronization
- VideoEnsemble: Combines all three with confidence-weighted fusion
Core Concepts¶
Biological Analysis (RPPG)¶
Remote photoplethysmography (rPPG) extracts heartbeat signals from subtle color changes in facial skin. Deepfakes often lack these biological rhythms.
Strengths: Hard to fake biological signals
Limitations: Requires clear faces, good lighting
Spatiotemporal Analysis (I3D)¶
Inception-3D networks learn motion patterns across space and time. Synthetic videos exhibit different temporal statistics than real ones.
Strengths: Works on full frames, doesn't need faces
Limitations: Requires minimum 64 frames
Audio-Visual Synchronization (LipSync)¶
SyncNet measures correspondence between audio and lip movements. Poor sync indicates dubbing or voice cloning.
Strengths: Effective for audio manipulation
Limitations: Needs both audio and video
Ensemble Approach¶
The VideoEnsemble combines all three signals using weighted averaging, where more confident signals have higher influence.
Benefits¶
- Robust: Works even if some signals fail
- Confident: Higher accuracy through fusion
- Transparent: Shows individual signal results
Usage¶
See the Video Detection Guide for detailed usage examples.
Technical Details¶
- Face Detection: MediaPipe (preferred) or Haar Cascades (fallback)
- Model Weights: Currently using placeholder URLs (requires real weights)
- Performance: ~5-8 FPS on CPU, 5-10x faster on GPU
Research¶
Based on: - PhysNet (Yu et al., 2019) for rPPG - Inception-I3D (Carreira & Zisserman, 2017) for motion - SyncNet (Chung & Zisserman, 2016) for lip-sync