Workflow Eval Detail

Burned-in Captions

Analyzes video frames to detect hardcoded captions baked into the visual content—useful for compliance checks and accessibility audits.

Latest Runcompleted

muxinc/ai

main·d5b5d84·@mux/ai v0.7.4

Cases

Avg Score

0.96

Avg Latency

6.15s

Avg Cost

$0.003

Avg Cost / Min

$0.0104/min

Avg Tokens

2,299

TL;DR

All providers perform very well on burned-in captions, with gpt-5.1 offering the best latency and cost balance and claude-sonnet-4-5 leading on pure quality at higher cost.

Best Quality

anthropic

claude-sonnet-4-5

Fastest

openai

gpt-5.1

Most Economical

openai

gpt-5.1

What we measure

Each eval run captures efficacy, efficiency, and expense. We use this data to compare providers and track regressions over time.

Efficacy

Quality + correctness

Efficiency

Latency + token usage

Expense

Cost per request

Workflow snapshot

Suite statussuccess

Suite average score0.96

Suite duration1 minute 32 seconds

Last suite runFeb 18, 09:10 PM

Evaluation criteria

From eval tests

We evaluate caption detection accuracy, confidence calibration, and response integrity alongside speed and cost thresholds.

Captions Detected

[ DIRECTOR YELLING SCENE: 1, TAKE: 9 ]

Spatial Decomposition

Efficacy checks

Caption presence matches ground truth labels.
Confidence >0.8 when captions are expected.
Confidence is 0-1, language is string or null, and storyboard URL is HTTPS.

Efficiency targets

Latency: scores are normalized between 0 and 1. Under 5s earns 1.0; past 12s trends toward 0.
Token usage: scores are normalized between 0 and 1. Under 4,000 tokens earns 1.0; higher usage reduces the score.

Expense guardrails

Estimated cost under $0.012 per request for full score.
Usage data must include total tokens for cost analysis.

Provider breakdown

Run d5b5d84

Efficacy scoreHigher is better

LatencyLower is better

Token UsageLower is better

CostLower is better

Provider	Model	Cases	Avg Score	Avg Latency	Avg Tokens	Avg Cost	Avg Cost / Min
anthropic	claude-sonnet-4-5	3	1	2.78s	2,448	$0.0077	$0.0269/min
google	gemini-2.5-flash	3	0.99	5.09s	1,867	$0.0022	$0.0077/min
google	gemini-3-flash-preview	3	0.99	5.18s	2,577	$0.0021	$0.0073/min
openai	gpt-5-mini	3	0.91	15.58s	2,974	$0.0022	$0.0078/min
openai	gpt-5.1	3	0.9	2.12s	1,631	$0.0006	$0.0021/min

Recent cases

Latest 6

openai ·gpt-5.1Feb 18, 09:12 PM

Asset gEvCHSJ

Score

0.71

Latency

2.73s

Cost

$0.0006

openai ·gpt-5-miniFeb 18, 09:12 PM

Asset atuutlT

Score

0.91

Latency

14.72s

Cost

$0.0018

openai ·gpt-5-miniFeb 18, 09:12 PM

Asset gEvCHSJ

Score

0.91

Latency

15.61s

Cost

$0.0019

anthropic ·claude-sonnet-4-5Feb 18, 09:12 PM

Asset atuutlT

Score

Latency

2.74s

Cost

$0.0082

anthropic ·claude-sonnet-4-5Feb 18, 09:12 PM

Asset gEvCHSJ

Score

Latency

2.74s

Cost

$0.0082

openai ·gpt-5.1Feb 18, 09:12 PM

Asset atuutlT

Score

Latency

2.28s

Cost

$0.0006