anthropic ·claude-sonnet-4-5Feb 18, 09:12 PM
Asset 88Lb01q
Score
1
Latency
5.99s
Cost
$0.0109
Workflow Eval Detail
Answers natural-language questions about a video by retrieving relevant context and answering with a concise response.
All providers perform strongly on this workflow with very high accuracy and low cost, and OpenAI gpt-5.1 is the recommended default for quality and latency, pending confirmation on a larger sample.
Each eval run captures efficacy, efficiency, and expense. We use this data to compare providers and track regressions over time.
We score answer accuracy and response integrity while tracking latency, token usage, and cost.
| Provider | Model | Cases | Avg Score | Avg Latency | Avg Tokens | Avg Cost | Avg Cost / Min |
|---|---|---|---|---|---|---|---|
| anthropic | claude-sonnet-4-5 | 1 | 1 | 5.99s | 2,856 | $0.0109 | $0.0189/min |
| gemini-2.5-flash | 1 | 1 | 5.28s | 1,504 | $0.0015 | $0.0027/min | |
| gemini-3-flash-preview | 1 | 1 | 7.56s | 2,531 | $0.0022 | $0.0039/min | |
| openai | gpt-5-mini | 1 | 0.92 | 13.2s | 3,216 | $0.0015 | $0.0027/min |
| openai | gpt-5.1 | 1 | 1 | 3.55s | 1,555 | $0.0031 | $0.0055/min |