Workflow Eval Detail

Caption Translation

Converts captions into multiple languages, helping you reach global audiences without manual translation work.

Latest Runcompleted
muxinc/ai
mainc15880c·@mux/ai v0.22.0
Cases
21
Avg Score
0.96
Avg Latency
9.11s
Avg Cost
$0.0059
Avg Cost / Min
$0.0103/min
Avg Tokens
2,649
TL;DR

Caption Translation workflow shows consistently high quality across providers, with Google gemini-3.1-flash-lite(-preview) emerging as the best balance of quality, latency, and cost, though per-model results are based on only 3 cases each.

Best Quality
google
gemini-3.1-flash-lite-preview
Fastest
google
gemini-3.1-flash-lite
Most Economical
google
gemini-3.1-flash-lite

What we measure

Each eval run captures efficacy, efficiency, and expense. We use this data to compare providers and track regressions over time.

Efficacy
Quality + correctness
Efficiency
Latency + token usage
Expense
Cost per request

Workflow snapshot

Suite statussuccess
Suite average score0.96
Suite duration4 minutes 48 seconds
Last suite runMay 18, 05:58 PM

Evaluation criteria

From eval tests

We validate VTT structure, translation faithfulness, and language code integrity, plus performance and budget targets.

English Translation
"Together we can reach more"
EN / ENG • Confidence 100%
Contextual Localization
Efficacy checks
  • Translated VTT starts with WEBVTT and keeps timestamps.
  • Cue count matches the original and translation differs.
  • Faithfulness scoring against the original transcript.
  • Language codes match ISO 639-1/3 and are consistent.
  • Response preserves asset ID and language fields.
Efficiency targets
  • Latency: scores are normalized between 0 and 1. Under 8s earns 1.0; past 15s trends toward 0.
  • Token usage: scores are normalized between 0 and 1. Under 2,500 tokens earns 1.0; higher usage reduces the score.
Expense guardrails
  • Estimated cost under $0.012 per request for full score.
  • Usage data must include total tokens for cost analysis.

Provider breakdown

Run c15880c
Efficacy scoreHigher is better
LatencyLower is better
Token UsageLower is better
CostLower is better
ProviderModelCasesAvg ScoreAvg LatencyAvg TokensAvg CostAvg Cost / Min
anthropicclaude-sonnet-4-5317.69s1,896$0.012$0.021/min
googlegemini-2.5-flash313.4s1,946$0.0021$0.0037/min
googlegemini-3-flash-preview30.8320.14s6,225$0.0155$0.0271/min
googlegemini-3.1-flash-lite312.33s1,855$0.0012$0.0021/min
googlegemini-3.1-flash-lite-preview312.35s1,859$0.0012$0.0021/min
openaigpt-5-mini30.9121.7s3,211$0.0045$0.0078/min
openaigpt-5.1316.18s1,554$0.0047$0.0082/min

Recent cases

Latest 6
anthropic ·claude-sonnet-4-5May 18, 06:01 PM
Asset 88Lb01q
Score
1
Latency
6.96s
Cost
$0.0119
anthropic ·claude-sonnet-4-5May 18, 06:01 PM
Asset 88Lb01q
Score
0.99
Latency
8.64s
Cost
$0.0124
google ·gemini-2.5-flashMay 18, 06:01 PM
Asset 88Lb01q
Score
1
Latency
3.21s
Cost
$0.002
google ·gemini-2.5-flashMay 18, 06:01 PM
Asset 88Lb01q
Score
1
Latency
3.33s
Cost
$0.0022
google ·gemini-2.5-flashMay 18, 06:01 PM
Asset 88Lb01q
Score
1
Latency
3.65s
Cost
$0.0021
anthropic ·claude-sonnet-4-5May 18, 06:01 PM
Asset 88Lb01q
Score
1
Latency
7.46s
Cost
$0.0118