openai ·gpt-5.1Feb 18, 09:12 PM
Asset gEvCHSJ
Score
0.71
Latency
2.73s
Cost
$0.0006
Workflow Eval Detail
Analyzes video frames to detect hardcoded captions baked into the visual content—useful for compliance checks and accessibility audits.
All providers perform very well on burned-in captions, with gpt-5.1 offering the best latency and cost balance and claude-sonnet-4-5 leading on pure quality at higher cost.
Each eval run captures efficacy, efficiency, and expense. We use this data to compare providers and track regressions over time.
We evaluate caption detection accuracy, confidence calibration, and response integrity alongside speed and cost thresholds.
| Provider | Model | Cases | Avg Score | Avg Latency | Avg Tokens | Avg Cost | Avg Cost / Min |
|---|---|---|---|---|---|---|---|
| anthropic | claude-sonnet-4-5 | 3 | 1 | 2.78s | 2,448 | $0.0077 | $0.0269/min |
| gemini-2.5-flash | 3 | 0.99 | 5.09s | 1,867 | $0.0022 | $0.0077/min | |
| gemini-3-flash-preview | 3 | 0.99 | 5.18s | 2,577 | $0.0021 | $0.0073/min | |
| openai | gpt-5-mini | 3 | 0.91 | 15.58s | 2,974 | $0.0022 | $0.0078/min |
| openai | gpt-5.1 | 3 | 0.9 | 2.12s | 1,631 | $0.0006 | $0.0021/min |