Workflow Eval Detail

Caption Translation

Converts captions into multiple languages, helping you reach global audiences without manual translation work.

Latest Runcompleted

muxinc/ai

main·d5b5d84·@mux/ai v0.7.4

Cases

Avg Score

0.94

Avg Latency

17.02s

Avg Cost

$0.0077

Avg Cost / Min

$0.0133/min

Avg Tokens

2,447

TL;DR

High-quality caption translations across all providers, with OpenAI gpt-5.1 as the best quality/latency default and Google gemini-2.5-flash as the most cost-efficient, based on a small 3-case sample per model.

Best Quality

openai

gpt-5.1

Fastest

openai

gpt-5.1

Most Economical

google

gemini-2.5-flash

What we measure

Each eval run captures efficacy, efficiency, and expense. We use this data to compare providers and track regressions over time.

Efficacy

Quality + correctness

Efficiency

Latency + token usage

Expense

Cost per request

Workflow snapshot

Suite statussuccess

Suite average score0.94

Suite duration4 minutes 56 seconds

Last suite runFeb 18, 09:10 PM

Evaluation criteria

From eval tests

We validate VTT structure, translation faithfulness, and language code integrity, plus performance and budget targets.

English Translation

"Together we can reach more"

EN / ENG • Confidence 100%

Contextual Localization

Efficacy checks

Translated VTT starts with WEBVTT and keeps timestamps.
Cue count matches the original and translation differs.
Faithfulness scoring against the original transcript.
Language codes match ISO 639-1/3 and are consistent.
Response preserves asset ID and language fields.

Efficiency targets

Latency: scores are normalized between 0 and 1. Under 8s earns 1.0; past 15s trends toward 0.
Token usage: scores are normalized between 0 and 1. Under 2,500 tokens earns 1.0; higher usage reduces the score.

Expense guardrails

Estimated cost under $0.012 per request for full score.
Usage data must include total tokens for cost analysis.

Provider breakdown

Run d5b5d84

Efficacy scoreHigher is better

LatencyLower is better

Token UsageLower is better

CostLower is better

Provider	Model	Cases	Avg Score	Avg Latency	Avg Tokens	Avg Cost	Avg Cost / Min
anthropic	claude-sonnet-4-5	3	1	8.3s	1,147	$0.0098	$0.017/min
google	gemini-2.5-flash	3	1	6.6s	2,090	$0.004	$0.007/min
google	gemini-3-flash-preview	3	0.82	26.1s	5,496	$0.0151	$0.0263/min
openai	gpt-5-mini	3	0.91	39.44s	2,634	$0.0045	$0.0079/min
openai	gpt-5.1	3	1	4.67s	866	$0.0049	$0.0086/min

Recent cases

Latest 6

anthropic ·claude-sonnet-4-5Feb 18, 09:12 PM

Asset 88Lb01q

Score

Latency

7.69s

Cost

$0.0096

openai ·gpt-5.1Feb 18, 09:12 PM

Asset 88Lb01q

Score

Latency

4.5s

Cost

$0.0047

openai ·gpt-5.1Feb 18, 09:12 PM

Asset 88Lb01q

Score

Latency

4.9s

Cost

$0.0049

openai ·gpt-5.1Feb 18, 09:12 PM

Asset 88Lb01q

Score

Latency

4.61s

Cost

$0.0051

openai ·gpt-5-miniFeb 18, 09:12 PM

Asset 88Lb01q

Score

0.92

Latency

37.63s

Cost

$0.0042

anthropic ·claude-sonnet-4-5Feb 18, 09:12 PM

Asset 88Lb01q

Score

Latency

8.41s

Cost

$0.0095