Performance Optimization¶
Guide to optimizing Veridex for speed and resource efficiency.
Performance Overview¶
Veridex performance varies by detector and hardware:
| Factor | Impact | Optimization |
|---|---|---|
| GPU availability | 2-20x speedup | Enable CUDA for compatible detectors |
| Model size | Affects load time & memory | Choose smaller models when possible |
| Batch size | Linear scaling | Process multiple items together |
| Caching | Faster subsequent runs | Reuse detector instances |
Hardware Recommendations¶
Minimum Requirements¶
- CPU: 2+ cores, 2.0+ GHz
- RAM: 4GB (8GB for image/audio)
- Storage: 10GB free (for model cache)
- GPU: Optional (CPU-only works)
Recommended for Production¶
- CPU: 4+ cores, 3.0+ GHz (or equivalent)
- RAM: 16GB+
- Storage: 50GB free SSD
- GPU: NVIDIA GPU with 6GB+ VRAM (for DIRESignal, Wav2VecSignal)
Detector Performance Comparison¶
Text Detectors¶
| Detector | Speed (1000 words) | Memory | GPU Benefit |
|---|---|---|---|
ZlibEntropySignal |
<0.1s | ~50MB | ❌ None |
StylometricSignal |
<0.1s | ~100MB | ❌ None |
PerplexitySignal |
2-5s (CPU) 0.5-1s (GPU) |
~2GB | ✅ 3-5x |
BinocularsSignal |
10-20s (CPU) 2-4s (GPU) |
~8GB | ✅ 4-6x |
Image Detectors¶
| Detector | Speed (1024x1024) | Memory | GPU Benefit |
|---|---|---|---|
FrequencySignal |
<1s | ~200MB | ❌ Minimal |
ELASignal |
<1s | ~200MB | ❌ Minimal |
DIRESignal |
30-60s (CPU) 3-5s (GPU) |
~5GB | ✅ 10-20x |
Audio Detectors¶
| Detector | Speed (30s audio) | Memory | GPU Benefit |
|---|---|---|---|
SpectralSignal |
<1s | ~100MB | ❌ None |
SilenceSignal |
<1s | ~100MB | ❌ None |
AASISTSignal |
2-5s | ~1GB | ✅ 2-3x |
Wav2VecSignal |
10-20s (CPU) 2-4s (GPU) |
~3GB | ✅ 5-10x |
Optimization Strategies¶
1. Choose the Right Detector¶
Use fast detectors for initial screening, expensive ones for confirmation:
from veridex.text import ZlibEntropySignal, PerplexitySignal
def smart_text_detection(text):
# Quick filter first
quick_detector = ZlibEntropySignal()
quick_result = quick_detector.run(text)
# Only run expensive detector if needed
if quick_result.score < 0.4:
return quick_result # Clearly human, skip expensive check
# Run accurate detector
accurate_detector = PerplexitySignal()
return accurate_detector.run(text)
Speedup: 3-5x for mostly human content
2. Reuse Detector Instances¶
❌ Slow (creates new detector each time):
for text in texts:
detector = PerplexitySignal() # Model loaded every time!
result = detector.run(text)
✅ Fast (reuse detector):
Speedup: 2-10x (avoids model reload)
3. Enable GPU Acceleration¶
Check if GPU is available:
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA device: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")
Install GPU-enabled PyTorch:
# CUDA 11.8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
# CUDA 12.1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
Force GPU usage:
from veridex.text import PerplexitySignal
import torch
# Ensure GPU is used
detector = PerplexitySignal()
# Models automatically use GPU if available
Force CPU (if needed):
4. Batch Processing¶
For multiple items, process in batches:
from veridex.text import PerplexitySignal
detector = PerplexitySignal()
# Process list of texts
texts = ["Text 1...", "Text 2...", "Text 3..."]
# ❌ One by one (slow)
results = [detector.run(text) for text in texts]
# ✅ Optimized with reused detector
detector = PerplexitySignal()
results = []
for text in texts:
result = detector.run(text)
results.append(result)
For parallel processing:
from multiprocessing import Pool
from veridex.text import ZlibEntropySignal
def detect_text(text):
detector = ZlibEntropySignal() # Lightweight, ok to create
return detector.run(text)
if __name__ == '__main__':
texts = ["Text 1...", "Text 2...", ...]
with Pool(processes=4) as pool:
results = pool.map(detect_text, texts)
Note: Only use multiprocessing for lightweight detectors. GPU detectors may conflict.
5. Optimize Model Cache¶
Set custom cache location:
# Set in shell
export HF_HOME=/fast/ssd/hf_cache
export TRANSFORMERS_CACHE=/fast/ssd/hf_cache
# Or in Python
import os
os.environ['HF_HOME'] = '/fast/ssd/hf_cache'
Pre-download models:
from transformers import AutoModel, AutoTokenizer
# Download during setup, not runtime
models = ["gpt2", "facebook/wav2vec2-base"]
for model_id in models:
AutoTokenizer.from_pretrained(model_id)
AutoModel.from_pretrained(model_id)
6. Reduce Model Size¶
Use smaller models for faster inference:
# ❌ Large model (slow but accurate)
detector = PerplexitySignal(model_id="gpt2-large")
# ✅ Smaller model (faster, slightly less accurate)
detector = PerplexitySignal(model_id="distilgpt2")
Model size comparison:
| Model | Size | Speed | Accuracy |
|---|---|---|---|
distilgpt2 |
~350MB | Fast | Good |
gpt2 |
~500MB | Medium | Better |
gpt2-large |
~3GB | Slow | Best |
7. Memory Management¶
Clear GPU cache:
import torch
import gc
# After processing batch
results = process_batch(texts)
# Clear memory
torch.cuda.empty_cache()
gc.collect()
Limit memory usage:
# Set max memory per GPU (in MB)
import torch
torch.cuda.set_per_process_memory_fraction(0.8) # Use max 80% of GPU memory
Production Deployment Patterns¶
Pattern 1: Two-Stage Pipeline¶
class TwoStageDetector:
def __init__(self):
self.stage1 = ZlibEntropySignal() # Fast filter
self.stage2 = PerplexitySignal() # Accurate detector
def detect(self, text):
# Stage 1: Quick filter
quick_result = self.stage1.run(text)
if quick_result.score < 0.3:
# Clearly human, skip stage 2
return quick_result
elif quick_result.score > 0.8:
# Clearly AI, skip stage 2
return quick_result
else:
# Uncertain, run stage 2
return self.stage2.run(text)
Benefit: 50-70% reduction in expensive detector calls
Pattern 2: Caching Results¶
from functools import lru_cache
import hashlib
class CachedDetector:
def __init__(self):
self.detector = PerplexitySignal()
def detect(self, text):
text_hash = hashlib.sha256(text.encode()).hexdigest()
return self._cached_detect(text_hash, text)
@lru_cache(maxsize=1000)
def _cached_detect(self, text_hash, text):
return self.detector.run(text)
Benefit: Instant return for repeated content
Pattern 3: Async Processing¶
import asyncio
from concurrent.futures import ThreadPoolExecutor
class AsyncDetector:
def __init__(self):
self.detector = PerplexitySignal()
self.executor = ThreadPoolExecutor(max_workers=4)
async def detect_async(self, text):
loop = asyncio.get_event_loop()
result = await loop.run_in_executor(
self.executor,
self.detector.run,
text
)
return result
async def detect_batch_async(self, texts):
tasks = [self.detect_async(text) for text in texts]
return await asyncio.gather(*tasks)
# Usage
async def main():
detector = AsyncDetector()
results = await detector.detect_batch_async(texts)
asyncio.run(main())
Benchmarking¶
Measure Your Performance¶
import time
def benchmark_detector(detector, inputs, num_runs=10):
times = []
# Warmup
detector.run(inputs[0])
# Benchmark
for _ in range(num_runs):
start = time.time()
for inp in inputs:
detector.run(inp)
end = time.time()
times.append(end - start)
avg_time = sum(times) / len(times)
throughput = len(inputs) * num_runs / sum(times)
print(f"Average time: {avg_time:.2f}s")
print(f"Throughput: {throughput:.2f} items/s")
print(f"Per-item: {avg_time/len(inputs)*1000:.2f}ms")
# Example
from veridex.text import PerplexitySignal
detector = PerplexitySignal()
texts = ["Sample text..." for _ in range(10)]
benchmark_detector(detector, texts)
Cloud Deployment Recommendations¶
AWS¶
# Recommended EC2 instances
# CPU-only:
- Instance: c6i.xlarge
vCPU: 4
RAM: 8 GB
Cost: ~$0.17/hr
Use: Text detection (lightweight)
# GPU:
- Instance: g4dn.xlarge
vCPU: 4
RAM: 16 GB
GPU: 1x NVIDIA T4 (16GB)
Cost: ~$0.526/hr
Use: Image/Audio detection
Google Cloud¶
# Recommended GCE instances
# CPU-only:
- Instance: n2-standard-4
vCPU: 4
RAM: 16 GB
Cost: ~$0.19/hr
# GPU:
- Instance: n1-standard-4 + T4
vCPU: 4
RAM: 15 GB
GPU: 1x NVIDIA T4
Cost: ~$0.45/hr
Docker Optimization¶
# Optimized Dockerfile
FROM python:3.10-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
libsndfile1 \
&& rm -rf /var/lib/apt/lists/*
# Install with specific extras
RUN pip install --no-cache-dir veridex[text,audio]
# Pre-download models (optional)
RUN python -c "from transformers import AutoModel; AutoModel.from_pretrained('gpt2')"
# Set cache location
ENV HF_HOME=/app/cache
VOLUME /app/cache
CMD ["python", "app.py"]
Monitoring & Profiling¶
Track Performance Metrics¶
import time
import psutil
class PerformanceMonitor:
def __init__(self):
self.metrics = []
def monitor(self, detector, input_data):
start_time = time.time()
start_memory = psutil.Process().memory_info().rss / 1024**2 # MB
result = detector.run(input_data)
end_time = time.time()
end_memory = psutil.Process().memory_info().rss / 1024**2
metrics = {
'latency_ms': (end_time - start_time) * 1000,
'memory_mb': end_memory - start_memory,
'detector': detector.__class__.__name__
}
self.metrics.append(metrics)
return result
def report(self):
import pandas as pd
df = pd.DataFrame(self.metrics)
print(df.groupby('detector').agg({
'latency_ms': ['mean', 'std', 'min', 'max'],
'memory_mb': ['mean', 'max']
}))
Next Steps¶
- Use Cases - Real-world applications
- API Reference - Complete API documentation
- Troubleshooting - Common issues
- FAQ - Frequently asked questions