Workflow Eval Detail

Burned-in Captions

Analyzes video frames to detect hardcoded captions baked into the visual content—useful for compliance checks and accessibility audits.

Latest Runcompleted
muxinc/ai
mainc15880c·@mux/ai v0.22.0
Cases
21
Avg Score
0.98
Avg Latency
5.13s
Avg Cost
$0.0028
Avg Cost / Min
$0.0099/min
Avg Tokens
2,890
TL;DR

Near-perfect caption detection across all providers at very low cost, with Google flash-lite models leading on latency and expense, though each model has been tested on only a few cases.

Best Quality
openai
gpt-5.1
Fastest
google
gemini-3.1-flash-lite-preview
Most Economical
google
gemini-3.1-flash-lite

What we measure

Each eval run captures efficacy, efficiency, and expense. We use this data to compare providers and track regressions over time.

Efficacy
Quality + correctness
Efficiency
Latency + token usage
Expense
Cost per request

Workflow snapshot

Suite statussuccess
Suite average score0.98
Suite duration1 minute 47 seconds
Last suite runMay 18, 05:59 PM

Evaluation criteria

From eval tests

We evaluate caption detection accuracy, confidence calibration, and response integrity alongside speed and cost thresholds.

Captions Detected
[ DIRECTOR YELLING SCENE: 1, TAKE: 9 ]
Spatial Decomposition
Efficacy checks
  • Caption presence matches ground truth labels.
  • Confidence >0.8 when captions are expected.
  • Confidence is 0-1, language is string or null, and storyboard URL is HTTPS.
Efficiency targets
  • Latency: scores are normalized between 0 and 1. Under 5s earns 1.0; past 12s trends toward 0.
  • Token usage: scores are normalized between 0 and 1. Under 4,000 tokens earns 1.0; higher usage reduces the score.
Expense guardrails
  • Estimated cost under $0.012 per request for full score.
  • Usage data must include total tokens for cost analysis.

Provider breakdown

Run c15880c
Efficacy scoreHigher is better
LatencyLower is better
Token UsageLower is better
CostLower is better
ProviderModelCasesAvg ScoreAvg LatencyAvg TokensAvg CostAvg Cost / Min
anthropicclaude-sonnet-4-5312.6s3,179$0.0099$0.0345/min
googlegemini-2.5-flash30.995.78s2,580$0.0026$0.0091/min
googlegemini-3-flash-preview314.16s3,294$0.0026$0.009/min
googlegemini-3.1-flash-lite311.97s2,624$0.0007$0.0024/min
googlegemini-3.1-flash-lite-preview311.63s2,624$0.0007$0.0024/min
openaigpt-5-mini30.917.57s3,649$0.0027$0.0095/min
openaigpt-5.1312.17s2,279$0.0007$0.0025/min

Recent cases

Latest 6
anthropic ·claude-sonnet-4-5May 18, 06:01 PM
Asset gEvCHSJ
Score
1
Latency
2.2s
Cost
$0.0104
anthropic ·claude-sonnet-4-5May 18, 06:01 PM
Asset atuutlT
Score
1
Latency
3.24s
Cost
$0.0104
google ·gemini-2.5-flashMay 18, 06:01 PM
Asset gIRjPqM
Score
0.99
Latency
5.52s
Cost
$0.003
google ·gemini-2.5-flashMay 18, 06:01 PM
Asset gEvCHSJ
Score
0.99
Latency
6.21s
Cost
$0.0022
google ·gemini-2.5-flashMay 18, 06:01 PM
Asset atuutlT
Score
0.99
Latency
5.61s
Cost
$0.0026
anthropic ·claude-sonnet-4-5May 18, 06:01 PM
Asset gIRjPqM
Score
1
Latency
2.37s
Cost
$0.0088