Workflow Eval Detail

Chapters

Automatically segments long-form video content into navigable chapters with timestamps and titles—enabling viewers to jump to key moments instantly.

Latest Runcompleted

muxinc/ai

main·d5b5d84·@mux/ai v0.7.4

Cases

10

Avg Score

0.96

Avg Latency

8.32s

Avg Cost

$0.0047

Avg Cost / Min

$0.0006/min

Avg Tokens

3,710

TL;DR

Chapters workflow performs with high structural accuracy and low cost across providers, with OpenAI gpt-5.1 currently the best quality/latency/cost tradeoff, though results are based on a small sample of runs.

Best Quality

openai

gpt-5.1

Fastest

openai

gpt-5.1

Most Economical

openai

gpt-5.1

What we measure

Each eval run captures efficacy, efficiency, and expense. We use this data to compare providers and track regressions over time.

Efficacy

Quality + correctness

Efficiency

Latency + token usage

Expense

Cost per request

Workflow snapshot

Suite statussuccess

Suite average score0.96

Suite duration1 minute 36 seconds

Last suite runFeb 18, 09:10 PM

Evaluation criteria

From eval tests

We evaluate chapter segmentation quality, timestamp accuracy, and title relevance alongside latency and cost metrics.

0:0020:00

Segmenting

→

↓

AI Chapters6 segments

1

Introduction & Welcome

0:00

2

Setting Up Your Environment

2:22

3

Core Concepts Deep Dive

6:25

4

Building the First Feature

10:12

5

Testing & Debugging

14:51

6

Deployment & Next Steps

18:44

Language: en

●Processing...

Temporal Segmentation

Efficacy checks

Chapter titles are non-empty and descriptive.
Start times are valid and monotonically increasing.
Chapter count is reasonable for video duration.
Language code matches expected ISO 639-1 format.
Semantic coherence between chapter titles and content.

Efficiency targets

Latency: scores are normalized between 0 and 1. Under 10s earns 1.0; past 25s trends toward 0.
Token usage: scores are normalized between 0 and 1. Under 5,000 tokens earns 1.0; higher usage reduces the score.

Expense guardrails

Estimated cost under $0.015 per request for full score.
Usage data must include total tokens for cost analysis.

Provider breakdown

Run d5b5d84

Efficacy scoreHigher is better

LatencyLower is better

Token UsageLower is better

CostLower is better

Provider	Model	Cases	Avg Score	Avg Latency	Avg Tokens	Avg Cost	Avg Cost / Min
anthropic	claude-sonnet-4-5	2	0.97	4.27s	3,173	$0.0105	$0.0012/min
google	gemini-2.5-flash	2	0.96	6.64s	4,470	$0.0045	$0.0005/min
google	gemini-3-flash-preview	2	0.96	8.47s	4,357	$0.0046	$0.0005/min
openai	gpt-5-mini	2	0.92	20.01s	3,782	$0.0025	$0.0003/min
openai	gpt-5.1	2	0.97	2.2s	2,771	$0.0014	$0.0002/min

Recent cases

Latest 6

openai ·gpt-5.1Feb 18, 09:12 PM

Asset 1XIUcA9

Score

0.97

Latency

2.49s

Cost

$0.0017

openai ·gpt-5.1Feb 18, 09:12 PM

Asset 1XIUcA9

Score

0.97

Latency

1.9s

Cost

$0.0011

openai ·gpt-5-miniFeb 18, 09:12 PM

Asset 1XIUcA9

Score

0.9

Latency

27.87s

Cost

$0.0034

openai ·gpt-5-miniFeb 18, 09:12 PM

Asset 1XIUcA9

Score

0.94

Latency

12.15s

Cost

$0.0015

anthropic ·claude-sonnet-4-5Feb 18, 09:12 PM

Asset 1XIUcA9

Score

0.97

Latency

5.06s

Cost

$0.011

anthropic ·claude-sonnet-4-5Feb 18, 09:12 PM

Asset 1XIUcA9

Score

0.97

Latency

3.49s

Cost

$0.0101