With MT Studio you can train multiple MT models from many MT providers, and evaluate them in one simple interface.
At the moment, MT Studio supports only Google Chrome and Mozilla Firefox browsers.
Create a project
On the start page, select Create project. Enter a name for your new project and the source and target languages.
If you want to train and evaluate custom MT models, read more in Training custom MT models.
If you already have machine translations and you want to evaluate them, see Evaluate MT below.
Evaluate MT
Upload your data
Upload a spreadsheet with text segments, their reference translations, and up to 30 machine translations.
The spreadsheet has to have a correct structure. See the requirements here. Select Download samples to download a sample file.
The optimal number of segments for evaluation is 2,000. If your spreadsheet has more segments, MT Studio will take only the first 2,000.
MT Studio can compute scores for up to 10,000 segments for users on some plans.
Choose metrics to compute
In MT Studio, you can compute five metrics: COMET, BERTScore, hLEPOR, TER, and BLEU. Select the ones you need, then select Create Project.
MT Studio will compute quality scores for all models' translations. This takes about a minute for hLEPOR, BLEU, and TER, and several minutes for COMET and BERTScore. The more models you have, the longer scoring takes.
Note on COMET and BERTScore: before computing these metrics, MT Studio makes the machine translations and the reference translation all-lowercase. Thus, errors in capitalization will not lead to a lower score.
Scoring results
When scoring is complete, you'll see the scores for all models. Models are sorted from highest-scoring to lowest-scoring by the first metric you've chosen. Note: higher translation quality means higher hLEPOR, BLEU, COMET, and BERTScore, but lower TER.
For hLEPOR, COMET, and BERTScore, MT Studio computes 83% confidence intervals. You can see them right under the corpus scores.
To sort models by another metric, select the metric name.
Scoring charts
Select Show charts to see visualizations of the corpus scores for each model.
How to read the charts
Scoring charts make it easy to tell apart translations by stock MT models and custom models. If the model name has the word "custom" in it, it's light blue on the charts. Other models are dark blue.
The height of the bar shows the corpus score for the whole test set. The black ticks on the bars are 83% confidence intervals.
BLEU and TER are corpus metrics, so there are no confidence intervals on the BLEU and TER charts.
Download results
Select Download results to download a report with scores. The report includes:
- a spreadsheet with corpus scores
- bar charts with models ranked according to the metrics you chose
- a spreadsheet with segment scores (not available for some plans)
Analyze customization
In MT Studio, you can compare two translations in detail, for example translations by a stock MT model and its customized version.
To start an in-depth analysis, select two models, then select Analyze and choose a metric for analysis. Our MT customization analysis tool will open. See more about MT customization analysis here.
All analyses of pairs of models will be saved in the Analysis history tab in your project.
Delete a project
To delete a project, select the button in the top right, then select Delete project.