Frequently Asked Questions (FAQ)¶
Common questions about Veridex and AI content detection.
General Questions¶
What is Veridex?¶
Veridex is a Python library for detecting AI-generated content across text, images, and audio. Unlike binary classifiers, it provides probabilistic scores with confidence estimates and interpretable signals.
How accurate is Veridex?¶
Accuracy varies by detector and modality:
- Text: 70-90% depending on detector (BinocularsSignal highest)
- Image: 75-92% (DIRESignal highest for diffusion models)
- Audio: 80-95% (Wav2VecSignal highest)
Important
No AI detector is 100% accurate. Always use multiple signals and human judgment for critical decisions.
Can Veridex prove content is AI-generated?¶
No. Veridex provides probabilistic estimates, not definitive proof. It should not be used as sole evidence for: - Legal proceedings - Academic misconduct cases - Content moderation decisions
Always combine with other evidence and human review.
Is Veridex free to use?¶
Yes! Veridex is open-source under the Apache 2.0 license. You can: - ✓ Use commercially - ✓ Modify the code - ✓ Distribute copies - ✓ Use privately
See LICENSE for details.
Installation & Setup¶
What Python version do I need?¶
Python 3.9 or higher is required.
Which installation should I use?¶
Choose based on your needs:
# All modalities (recommended for evaluation)
pip install veridex[text,image,audio]
# Specific modality (production)
pip install veridex[text] # Text only
pip install veridex[image] # Image only
pip install veridex[audio] # Audio only
# Core only (minimal dependencies)
pip install veridex
How large are the model downloads?¶
| Detector | Model Size | First-Run Download |
|---|---|---|
PerplexitySignal |
~500MB | GPT-2 |
BinocularsSignal |
~7GB | GPT-2 + Falcon |
Wav2VecSignal |
~1.2GB | Wav2Vec 2.0 |
DIRESignal |
~4GB | Stable Diffusion |
| Others | None | No download |
Models are cached after first download.
Can I use a custom cache directory?¶
Yes! Set environment variables:
Usage Questions¶
What's the difference between score and confidence?¶
- Score (0.0-1.0): Probability that content is AI-generated
- Confidence (0.0-1.0): How certain the detector is about the score
Example:
result.score = 0.8 # 80% probability of AI
result.confidence = 0.9 # Very confident in this assessment
Low confidence means the detector is unsure → consider using additional signals.
How do I choose which detector to use?¶
Use this decision matrix:
Text:
- Speed priority → ZlibEntropySignal
- Accuracy priority → BinocularsSignal
- Balanced → PerplexitySignal
Image:
- Quick screening → FrequencySignal
- High accuracy → DIRESignal (requires GPU)
- Manipulation detection → ELASignal
Audio:
- Lightweight → SpectralSignal
- Anti-spoofing → AASISTSignal
- Production-grade → Wav2VecSignal
Can I combine multiple detectors?¶
Yes! This is recommended for better accuracy:
from veridex.text import PerplexitySignal, ZlibEntropySignal
text = "Your text..."
# Run multiple detectors
detectors = [PerplexitySignal(), ZlibEntropySignal()]
scores = [d.run(text).score for d in detectors]
# Average the scores
ensemble_score = sum(scores) / len(scores)
See Ensemble Detection Tutorial for advanced techniques.
What input formats are supported?¶
- Plain text strings
- Any encoding (UTF-8 recommended)
- Min length: ~10 words for reliable results
- Formats: PNG, JPEG, WebP, TIFF
- Any resolution (will be resized if needed)
- RGB or grayscale
- Formats: WAV, MP3, FLAC, OGG
- Sample rates: Any (will be resampled to 16kHz)
- Mono or stereo (converted to mono)
Performance Questions¶
Do I need a GPU?¶
Optional for most detectors:
| Detector | GPU Required | Speedup with GPU |
|---|---|---|
PerplexitySignal |
No | 2-3x faster |
BinocularsSignal |
No | 2-3x faster |
DIRESignal |
Yes | 10-20x faster |
Wav2VecSignal |
No | 5-10x faster |
| Others | No | Minimal |
DIRESignal (image) strongly recommended to use GPU.
How fast is Veridex?¶
Approximate processing times (CPU):
Text (1000 words):
- ZlibEntropySignal: <0.1s
- PerplexitySignal: 2-5s
- BinocularsSignal: 10-20s
Image (1024x1024):
- FrequencySignal: <1s
- ELASignal: <1s
- DIRESignal: 30-60s (GPU: 3-5s)
Audio (30 seconds):
- SpectralSignal: <1s
- AASISTSignal: 2-5s
- Wav2VecSignal: 10-20s
Can I batch process files?¶
Yes! Just loop through your files:
from veridex.text import PerplexitySignal
detector = PerplexitySignal()
texts = ["text1...", "text2...", "text3..."]
results = [detector.run(text) for text in texts]
For production, consider using async processing or multiprocessing.
Technical Questions¶
What ML frameworks does Veridex use?¶
- PyTorch (via transformers) for LLM-based detectors
- NumPy/SciPy for signal processing
- librosa for audio analysis
- PIL/OpenCV for image processing
Can I fine-tune the detectors?¶
Not currently. Detectors use pre-trained models. Future versions may support custom training.
Can I add my own detectors?¶
Yes! Extend the BaseSignal class:
from veridex.core import BaseSignal, DetectionResult
class MyCustomSignal(BaseSignal):
def run(self, input_data):
# Your detection logic here
score = 0.5 # Calculate score
confidence = 0.8
metadata = {"custom_metric": 123}
return DetectionResult(
score=score,
confidence=confidence,
metadata=metadata
)
See API Reference for details.
Does Veridex send data to external servers?¶
No. All processing happens locally. Models are downloaded fromHugging Face on first use and cached locally.
Limitations & Edge Cases¶
Can Veridex detect all AI-generated content?¶
No. Veridex has limitations:
- ❌ Cannot detect all AI models (trained on common ones)
- ❌ Vulnerable to adversarial attacks
- ❌ Accuracy degrades with post-processing/editing
- ❌ Newer/unknown models may evade detection
What about paraphrased AI content?¶
Detectors vary in robustness:
- BinocularsSignal: More robust to paraphrasing
- PerplexitySignal: Moderately robust
- ZlibEntropySignal: Less robust
Heavily paraphrased content may evade detection.
Can it detect AI-assisted (human-edited) content?¶
This is challenging. If a human significantly edits AI content, it becomes a hybrid that's hard to classify.
What about content from new AI models?¶
Detectors may not recognize very new models (e.g., GPT-5, future models). Regular updates are needed.
Error Messages¶
"ImportError: No module named 'transformers'"¶
Install the required dependencies:
pip install veridex[text] # For text detectors
pip install veridex[image] # For image detectors
pip install veridex[audio] # For audio detectors
"Model download failed"¶
Check your internet connection and try:
# Set custom cache and retry
export HF_HOME=/tmp/hf_cache
python -c "from veridex.text import PerplexitySignal; PerplexitySignal()"
"CUDA out of memory"¶
Reduce batch size or use CPU:
import torch
torch.cuda.empty_cache() # Free GPU memory
# Force CPU usage
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""
Getting Help¶
Where can I get support?¶
- 📖 Documentation
- 🐛 GitHub Issues - Bug reports
- 💬 GitHub Discussions - Questions
- 📧 Email - Direct contact
How do I report a bug?¶
- Check existing issues
- Create a new issue with:
- Veridex version (
pip show veridex) - Python version
- OS and system info
- Minimal code to reproduce
- Error message/traceback
Can I contribute?¶
Yes! See CONTRIBUTING.md for guidelines.