Workflow Eval Detail

Chapters

Automatically segments long-form video content into navigable chapters with timestamps and titles—enabling viewers to jump to key moments instantly.

Latest Runcompleted
muxinc/ai
maind5b5d84·@mux/ai v0.7.4
Cases
10
Avg Score
0.96
Avg Latency
8.32s
Avg Cost
$0.0047
Avg Cost / Min
$0.0006/min
Avg Tokens
3,710
TL;DR

Chapters workflow performs with high structural accuracy and low cost across providers, with OpenAI gpt-5.1 currently the best quality/latency/cost tradeoff, though results are based on a small sample of runs.

Best Quality
openai
gpt-5.1
Fastest
openai
gpt-5.1
Most Economical
openai
gpt-5.1

What we measure

Each eval run captures efficacy, efficiency, and expense. We use this data to compare providers and track regressions over time.

Efficacy
Quality + correctness
Efficiency
Latency + token usage
Expense
Cost per request

Workflow snapshot

Suite statussuccess
Suite average score0.96
Suite duration1 minute 36 seconds
Last suite runFeb 18, 09:10 PM

Evaluation criteria

From eval tests

We evaluate chapter segmentation quality, timestamp accuracy, and title relevance alongside latency and cost metrics.

0:0020:00
Segmenting
AI Chapters6 segments
1
Introduction & Welcome
0:00
2
Setting Up Your Environment
2:22
3
Core Concepts Deep Dive
6:25
4
Building the First Feature
10:12
5
Testing & Debugging
14:51
6
Deployment & Next Steps
18:44
Language: en
Processing...
Temporal Segmentation
Efficacy checks
  • Chapter titles are non-empty and descriptive.
  • Start times are valid and monotonically increasing.
  • Chapter count is reasonable for video duration.
  • Language code matches expected ISO 639-1 format.
  • Semantic coherence between chapter titles and content.
Efficiency targets
  • Latency: scores are normalized between 0 and 1. Under 10s earns 1.0; past 25s trends toward 0.
  • Token usage: scores are normalized between 0 and 1. Under 5,000 tokens earns 1.0; higher usage reduces the score.
Expense guardrails
  • Estimated cost under $0.015 per request for full score.
  • Usage data must include total tokens for cost analysis.

Provider breakdown

Run d5b5d84
Efficacy scoreHigher is better
LatencyLower is better
Token UsageLower is better
CostLower is better
ProviderModelCasesAvg ScoreAvg LatencyAvg TokensAvg CostAvg Cost / Min
anthropicclaude-sonnet-4-520.974.27s3,173$0.0105$0.0012/min
googlegemini-2.5-flash20.966.64s4,470$0.0045$0.0005/min
googlegemini-3-flash-preview20.968.47s4,357$0.0046$0.0005/min
openaigpt-5-mini20.9220.01s3,782$0.0025$0.0003/min
openaigpt-5.120.972.2s2,771$0.0014$0.0002/min

Recent cases

Latest 6
openai ·gpt-5.1Feb 18, 09:12 PM
Asset 1XIUcA9
Score
0.97
Latency
2.49s
Cost
$0.0017
openai ·gpt-5.1Feb 18, 09:12 PM
Asset 1XIUcA9
Score
0.97
Latency
1.9s
Cost
$0.0011
openai ·gpt-5-miniFeb 18, 09:12 PM
Asset 1XIUcA9
Score
0.9
Latency
27.87s
Cost
$0.0034
openai ·gpt-5-miniFeb 18, 09:12 PM
Asset 1XIUcA9
Score
0.94
Latency
12.15s
Cost
$0.0015
anthropic ·claude-sonnet-4-5Feb 18, 09:12 PM
Asset 1XIUcA9
Score
0.97
Latency
5.06s
Cost
$0.011
anthropic ·claude-sonnet-4-5Feb 18, 09:12 PM
Asset 1XIUcA9
Score
0.97
Latency
3.49s
Cost
$0.0101