With MT Studio you can evaluate multiple MT models from many MT providers in one simple interface. Start by computing scores for all your engines' translations.
At the moment, MT Studio supports only Google Chrome and Mozilla Firefox browsers.
Select Create Project. Enter a name for your new project. Choose the source and target language from the list.
Upload your data
Upload a spreadsheet with text segments, their reference translations, and up to 30 machine translations.
The spreadsheet has to have a correct structure. See the requirements here. Select Download samples to download a sample file.
The optimal number of segments for evaluation is 2,000. If your spreadsheet has more segments, MT Studio will take only the first 2,000.
MT Studio can compute scores for up to 10,000 segments for users on some plans.
Choose metrics to compute
In MT Studio, you can compute 5 metrics: COMET, BERTScore, hLEPOR, TER, and BLEU. Select the ones you need, then select Create Project.
MT Studio will compute quality scores for all models' translations. This takes about a minute for hLEPOR, BLEU, and TER, and several minutes for COMET and BERTScore. The more models you have, the longer scoring takes.
When scoring is complete, you'll see the scores for all models. Models are sorted from highest-scoring to lowest-scoring by the first metric you've chosen. Note: higher translation quality means higher hLEPOR, BLEU, COMET, and BERTScore, but lower TER.
For hLEPOR, COMET, and BERTScore, MT Studio computes 83% confidence intervals. You can see them right under the corpus scores.
To sort models by another metric, select the metric name.
Select Show charts to see visualizations of the corpus scores for each model.
How to read the charts
Scoring charts make it easy to tell apart translations by stock MT models and custom models. If the model name has the word "custom" in it, it's light blue on the charts. Other models are dark blue.
The height of the bar shows the corpus score for the whole test set. The black ticks on the bars are 83% confidence intervals.
BLEU and TER are corpus metrics, so there are no confidence intervals on the BLEU and TER charts.
Select Download results to download a report with scores. The report includes:
- a spreadsheet with corpus scores
- bar charts with models ranked according to the metrics you chose
- a spreadsheet with segment scores (not available for some plans)
In MT Studio, you can compare two translations in detail, for example translations by a stock MT model and its customized version.
To start an in-depth analysis, select two models, then select Analyze and choose a metric for analysis. Our MT customization analysis tool will open. See more about MT customization analysis here.
All analyses of pairs of models will be saved in the Analysis history tab in your project.
Delete a project
To delete a project, select the button in the top right, then select Delete project.