anthropic ·claude-sonnet-4-5Feb 18, 09:12 PM
Asset 88Lb01q
Score
0.97
Latency
6.71s
Cost
$0.0109
Workflow Eval Detail
Generates concise summaries and smart tags from your content—perfect for search, discovery, and quick recaps.
Anthropic leads on summarization quality, OpenAI gpt-5.1 on latency, and OpenAI gpt-5-mini on cost, but the findings are based on only 5 runs and should be treated as directional.
Each eval run captures efficacy, efficiency, and expense. We use this data to compare providers and track regressions over time.
We score summary quality, tag relevance, and semantic similarity while tracking latency, token usage, and cost.
| Provider | Model | Cases | Avg Score | Avg Latency | Avg Tokens | Avg Cost | Avg Cost / Min |
|---|---|---|---|---|---|---|---|
| anthropic | claude-sonnet-4-5 | 1 | 0.97 | 6.71s | 3,019 | $0.0109 | $0.019/min |
| gemini-2.5-flash | 1 | 0.96 | 8.31s | 2,408 | $0.0032 | $0.0056/min | |
| gemini-3-flash-preview | 1 | 0.95 | 7.34s | 2,793 | $0.0023 | $0.004/min | |
| openai | gpt-5-mini | 1 | 0.89 | 16.58s | 3,478 | $0.0016 | $0.0029/min |
| openai | gpt-5.1 | 1 | 0.95 | 5.09s | 1,828 | $0.0021 | $0.0036/min |