@mux/ai Eval Results

Measuring efficacy, efficiency, and expense on every run.

Every workflow ships with evals that measure quality, speed, and cost. This dashboard surfaces the latest results so you can compare providers and trust the defaults we recommend.

Latest Runcompleted
muxinc/ai
maind5b5d84·@mux/ai v0.7.4
Cases
50
Providers
3
Started
Feb 18, 09:10 PM
Completed
Feb 18, 09:12 PM

Workflow Scorecards

Run d5b5d84

Ask Questions

Evalite suite coverage

View eval details →
Cases
5
Avg Score
0.98
Avg Latency
7.12s
Avg Cost
$0.0039

Answers natural-language questions about a video by retrieving relevant context and answering with a concise response.

Burned-in Captions

Evalite suite coverage

View eval details →
Cases
15
Avg Score
0.96
Avg Latency
6.15s
Avg Cost
$0.003

Analyzes video frames to detect hardcoded captions baked into the visual content—useful for compliance checks and accessibility audits.

Caption Translation

Evalite suite coverage

View eval details →
Cases
15
Avg Score
0.94
Avg Latency
17.02s
Avg Cost
$0.0077

Converts captions into multiple languages, helping you reach global audiences without manual translation work.

Chapters

Evalite suite coverage

View eval details →
Cases
10
Avg Score
0.96
Avg Latency
8.32s
Avg Cost
$0.0047

Automatically segments long-form video content into navigable chapters with timestamps and titles—enabling viewers to jump to key moments instantly.

Summarization

Evalite suite coverage

View eval details →
Cases
5
Avg Score
0.94
Avg Latency
8.81s
Avg Cost
$0.004

Generates concise summaries and smart tags from your content—perfect for search, discovery, and quick recaps.

Recent Runs

muxinc/ai ·main
d5b5d84·Feb 18, 09:12 PM·@mux/ai v0.7.4
completed
muxinc/ai ·main
11c4311·Feb 18, 08:55 PM·@mux/ai v0.7.3
completed
muxinc/ai ·vb/add-smaller-models-to-published-evals
22e484b·Feb 18, 08:55 PM·@mux/ai v0.7.3
completed
muxinc/ai ·main
9298365·Feb 18, 01:10 PM·@mux/ai v0.7.3
completed
muxinc/ai ·main
c942a12·Feb 17, 08:37 PM·@mux/ai v0.7.3
completed

How to read this

Each workflow card summarizes the latest eval suite for that workflow. Metrics are aggregated for the most recent run only.

  • 1Case count shows total provider/model executions.
  • 2Avg score is the mean of Evalite case scores.
  • 3Latency + cost are averaged across the run.