anthropic ·claude-sonnet-4-5May 18, 06:01 PM
Asset gEvCHSJ
Score
1
Latency
2.2s
Cost
$0.0104
Workflow Eval Detail
Analyzes video frames to detect hardcoded captions baked into the visual content—useful for compliance checks and accessibility audits.
Near-perfect caption detection across all providers at very low cost, with Google flash-lite models leading on latency and expense, though each model has been tested on only a few cases.
Each eval run captures efficacy, efficiency, and expense. We use this data to compare providers and track regressions over time.
We evaluate caption detection accuracy, confidence calibration, and response integrity alongside speed and cost thresholds.
| Provider | Model | Cases | Avg Score | Avg Latency | Avg Tokens | Avg Cost | Avg Cost / Min |
|---|---|---|---|---|---|---|---|
| anthropic | claude-sonnet-4-5 | 3 | 1 | 2.6s | 3,179 | $0.0099 | $0.0345/min |
| gemini-2.5-flash | 3 | 0.99 | 5.78s | 2,580 | $0.0026 | $0.0091/min | |
| gemini-3-flash-preview | 3 | 1 | 4.16s | 3,294 | $0.0026 | $0.009/min | |
| gemini-3.1-flash-lite | 3 | 1 | 1.97s | 2,624 | $0.0007 | $0.0024/min | |
| gemini-3.1-flash-lite-preview | 3 | 1 | 1.63s | 2,624 | $0.0007 | $0.0024/min | |
| openai | gpt-5-mini | 3 | 0.9 | 17.57s | 3,649 | $0.0027 | $0.0095/min |
| openai | gpt-5.1 | 3 | 1 | 2.17s | 2,279 | $0.0007 | $0.0025/min |