anthropic ·claude-sonnet-4-5May 18, 06:01 PM
Asset 88Lb01q
Score
1
Latency
6.17s
Cost
$0.0132
Workflow Eval Detail
Generates concise summaries and smart tags from your content—perfect for search, discovery, and quick recaps.
High-quality, low-cost summarization across providers with Anthropic best on quality and Google best on speed/cost, but conclusions are tentative given only 7 cases.
Each eval run captures efficacy, efficiency, and expense. We use this data to compare providers and track regressions over time.
We score summary quality, tag relevance, and semantic similarity while tracking latency, token usage, and cost.
| Provider | Model | Cases | Avg Score | Avg Latency | Avg Tokens | Avg Cost | Avg Cost / Min |
|---|---|---|---|---|---|---|---|
| anthropic | claude-sonnet-4-5 | 4 | 0.99 | 6.06s | 3,838 | $0.0132 | $0.0229/min |
| gemini-2.5-flash | 4 | 0.99 | 7.41s | 3,041 | $0.0029 | $0.0064/min | |
| gemini-3-flash-preview | 4 | 0.97 | 7.82s | 4,339 | $0.005 | $0.011/min | |
| gemini-3.1-flash-lite | 4 | 0.99 | 2.37s | 2,986 | $0.0009 | $0.0016/min | |
| gemini-3.1-flash-lite-preview | 4 | 0.99 | 2.69s | 2,985 | $0.0009 | $0.0016/min | |
| openai | gpt-5-mini | 4 | 0.92 | 16.48s | 4,522 | $0.0024 | $0.0047/min |
| openai | gpt-5.1 | 3 | 0.99 | 4.91s | 2,509 | $0.0036 | $0.0072/min |