Workflow Eval Detail

Chapters

Automatically segments long-form video content into navigable chapters with timestamps and titles—enabling viewers to jump to key moments instantly.

Latest Runcompleted
muxinc/ai
mainb7cce22·@mux/ai v0.13.1
Cases
27
Avg Score
0.98
Avg Latency
5.85s
Avg Cost
$0.0043
Avg Cost / Min
$0.0005/min
Avg Tokens
3,663
TL;DR

Chapters performs accurately and cheaply across providers, with Google gemini-3.1-flash-lite-preview as the best current all-round option, though per-model results are based on only 2 runs each.

Best Quality
google
gemini-3.1-flash-lite-preview
Fastest
google
gemini-3.1-flash-lite-preview
Most Economical
google
gemini-3.1-flash-lite-preview

What we measure

Each eval run captures efficacy, efficiency, and expense. We use this data to compare providers and track regressions over time.

Efficacy
Quality + correctness
Efficiency
Latency + token usage
Expense
Cost per request

Workflow snapshot

Suite statussuccess
Suite average score0.96
Suite duration1 minute 24 seconds
Last suite runApr 3, 07:56 PM

Evaluation criteria

From eval tests

We evaluate chapter segmentation quality, timestamp accuracy, and title relevance alongside latency and cost metrics.

0:0020:00
Segmenting
AI Chapters6 segments
1
Introduction & Welcome
0:00
2
Setting Up Your Environment
2:22
3
Core Concepts Deep Dive
6:25
4
Building the First Feature
10:12
5
Testing & Debugging
14:51
6
Deployment & Next Steps
18:44
Language: en
Processing...
Temporal Segmentation
Efficacy checks
  • Chapter titles are non-empty and descriptive.
  • Start times are valid and monotonically increasing.
  • Chapter count is reasonable for video duration.
  • Language code matches expected ISO 639-1 format.
  • Semantic coherence between chapter titles and content.
Efficiency targets
  • Latency: scores are normalized between 0 and 1. Under 10s earns 1.0; past 25s trends toward 0.
  • Token usage: scores are normalized between 0 and 1. Under 5,000 tokens earns 1.0; higher usage reduces the score.
Expense guardrails
  • Estimated cost under $0.015 per request for full score.
  • Usage data must include total tokens for cost analysis.

Provider breakdown

Run b7cce22
Efficacy scoreHigher is better
LatencyLower is better
Token UsageLower is better
CostLower is better
ProviderModelCasesAvg ScoreAvg LatencyAvg TokensAvg CostAvg Cost / Min
anthropicclaude-sonnet-4-550.984.02s3,272$0.0113$0.0013/min
googlegemini-2.5-flash50.986.37s4,367$0.0041$0.0006/min
googlegemini-3-flash-preview50.986.06s4,165$0.0039$0.0004/min
googlegemini-3.1-flash-lite-preview50.991.75s3,282$0.001$0.0001/min
openaigpt-5-mini40.9514.71s3,728$0.0025$0.0003/min
openaigpt-5.130.982.71s2,856$0.0017$0.0002/min

Recent cases

Latest 6
anthropic ·claude-sonnet-4-5Apr 3, 07:58 PM
Asset 1XIUcA9
Score
0.96
Latency
3.53s
Cost
$0.0103
anthropic ·claude-sonnet-4-5Apr 3, 07:58 PM
Asset 1XIUcA9
Score
1
Latency
4.74s
Cost
$0.0118
anthropic ·claude-sonnet-4-5Apr 3, 07:58 PM
Asset 1XIUcA9
Score
1
Latency
3.57s
Cost
$0.0112
anthropic ·claude-sonnet-4-5Apr 3, 07:58 PM
Asset 1XIUcA9
Score
1
Latency
4.11s
Cost
$0.0119
google ·gemini-2.5-flashApr 3, 07:58 PM
Asset 1XIUcA9
Score
0.96
Latency
8.4s
Cost
$0.0059
anthropic ·claude-sonnet-4-5Apr 3, 07:58 PM
Asset 1XIUcA9
Score
0.96
Latency
4.13s
Cost
$0.0115