Workflow Eval Detail

Chapters

Automatically segments long-form video content into navigable chapters with timestamps and titles—enabling viewers to jump to key moments instantly.

Latest Runcompleted
muxinc/ai
mainc15880c·@mux/ai v0.22.0
Cases
32
Avg Score
0.97
Avg Latency
6.72s
Avg Cost
$0.0048
Avg Cost / Min
$0.0005/min
Avg Tokens
4,338
TL;DR

Chapters workflow shows consistently high quality and low cost across providers, with google gemini-3.1-flash-lite and its preview variant emerging as the leading options on this 14-run sample.

Best Quality
google
gemini-3.1-flash-lite
Fastest
google
gemini-3.1-flash-lite-preview
Most Economical
google
gemini-3.1-flash-lite-preview

What we measure

Each eval run captures efficacy, efficiency, and expense. We use this data to compare providers and track regressions over time.

Efficacy
Quality + correctness
Efficiency
Latency + token usage
Expense
Cost per request

Workflow snapshot

Suite statussuccess
Suite average score0.96
Suite duration2 minutes 2 seconds
Last suite runMay 18, 05:58 PM

Evaluation criteria

From eval tests

We evaluate chapter segmentation quality, timestamp accuracy, and title relevance alongside latency and cost metrics.

0:0020:00
Segmenting
AI Chapters6 segments
1
Introduction & Welcome
0:00
2
Setting Up Your Environment
2:22
3
Core Concepts Deep Dive
6:25
4
Building the First Feature
10:12
5
Testing & Debugging
14:51
6
Deployment & Next Steps
18:44
Language: en
Processing...
Temporal Segmentation
Efficacy checks
  • Chapter titles are non-empty and descriptive.
  • Start times are valid and monotonically increasing.
  • Chapter count is reasonable for video duration.
  • Language code matches expected ISO 639-1 format.
  • Semantic coherence between chapter titles and content.
Efficiency targets
  • Latency: scores are normalized between 0 and 1. Under 10s earns 1.0; past 25s trends toward 0.
  • Token usage: scores are normalized between 0 and 1. Under 5,000 tokens earns 1.0; higher usage reduces the score.
Expense guardrails
  • Estimated cost under $0.015 per request for full score.
  • Usage data must include total tokens for cost analysis.

Provider breakdown

Run c15880c
Efficacy scoreHigher is better
LatencyLower is better
Token UsageLower is better
CostLower is better
ProviderModelCasesAvg ScoreAvg LatencyAvg TokensAvg CostAvg Cost / Min
anthropicclaude-sonnet-4-550.994.57s3,837$0.0131$0.0015/min
googlegemini-2.5-flash50.987.21s5,051$0.0047$0.0005/min
googlegemini-3-flash-preview50.969.96s5,622$0.0073$0.0009/min
googlegemini-3.1-flash-lite50.991.48s3,789$0.0012$0.0001/min
googlegemini-3.1-flash-lite-preview50.991.32s3,782$0.0012$0.0001/min
openaigpt-5-mini40.9120.3s4,596$0.0032$0.0004/min
openaigpt-5.130.973.73s3,349$0.0017$0.0002/min

Recent cases

Latest 6
anthropic ·claude-sonnet-4-5May 18, 06:01 PM
Asset 1XIUcA9
Score
0.97
Latency
3.86s
Cost
$0.012
anthropic ·claude-sonnet-4-5May 18, 06:01 PM
Asset 1XIUcA9
Score
1
Latency
5.28s
Cost
$0.0134
anthropic ·claude-sonnet-4-5May 18, 06:01 PM
Asset 1XIUcA9
Score
1
Latency
4.54s
Cost
$0.0132
anthropic ·claude-sonnet-4-5May 18, 06:01 PM
Asset 1XIUcA9
Score
1
Latency
4.8s
Cost
$0.0137
google ·gemini-2.5-flashMay 18, 06:01 PM
Asset 1XIUcA9
Score
0.96
Latency
7.56s
Cost
$0.0045
anthropic ·claude-sonnet-4-5May 18, 06:01 PM
Asset 1XIUcA9
Score
0.96
Latency
4.39s
Cost
$0.0133