With Scoring, you can estimate the quality of machine translations quickly.
There are more than 30 MT providers on the market. Getting human experts to evaluate them all is costly and time-consuming. Which providers should you consider first? Are there outliers that you shouldn't spend time on at all? Corpus scores may have the answers.
Upload translations and get them ranked according to established quality metrics. The best provider for your content is likely to be among the leaders of the ranking.
At the moment, Scoring supports only Google Chrome and Mozilla Firefox browsers.
Start working with Scoring
Load your test set
Upload a spreadsheet with text segments, their reference translations, and any number of machine translations. The spreadsheet has to have a correct structure. Select Download sample on the pane on the right to download a sample file.
The optimal number of segments for evaluation is 2,000. If your spreadsheet has more segments, Scoring will take only the first 2,000.
Scoring can compute scores for up to 10,000 segments for users on some plans.
Enter the source and target languages in the From and To fields.
Select Start scoring.
Scoring will compute quality scores for all translations. This takes a couple of minutes.
When scoring is complete, you'll see the scores for all translations. Translations are sorted from highest-scoring to lowest-scoring by the first metric you've chosen. Note: higher translation quality means higher hLEPOR and BLEU scores but lower TER.
To sort translations by another metric, select the metric name.
Select Download results to download a report with scores. The report includes:
- a spreadsheet with corpus scores
- bar charts with translations sorted by quality according to the metrics you chose
- a spreadsheet with segment scores (not available for some plans)
How to read the charts
Scoring charts make it easy to tell apart translations by stock MT models and custom models. If the translation name has the word "custom" in it, the translation is light blue on the charts. Other translations are dark blue.
The height of the bar shows the corpus score for the whole test set. The black ticks on the bars are confidence intervals — 83% of segment scores fall within these intervals.
BLEU is a corpus metric, so there are no confidence intervals on the BLEU chart.
Save your project
Saving projects is only available for users on some plans.