With MT Studio, you can train custom machine translation models on different platforms from the same interface.
Important! To train models, you need to have a direct contract with the MT provider.
How MT customization works
When you train a custom MT model, you customize a provider's existing MT model to make it translate specific content in a specific way. To customize a model, you need to give it parallel text segments: source texts and translations into a target language. The model processes the segments and learns to translate words and phrases in a way that you want it to.
Compared to stock MT models, custom models produce translations that are closer to reference translations.
Before you begin
Start accounts with all MT providers whose models you want to customize. Then connect those accounts to your Intento account. See Providers' details below for instructions.
To train custom models, you need one or more files with parallel text segments. For best training results, the total number of segments should be between 15,000 and 100,000. Each file must not be greater than 250 MB. Some MT providers need a minimum number of segments to train a custom model: see Providers' details below.
MT Studio works with tmx, csv, and tsv files.
- csv and tsv files must contain only two columns: source and target. There must not be any other columns in the file.
- tmx files must contain only two languages: the first will be the source language and the second will be the target language.
|Provider||Connected accounts||Training costs||Minimum number of segments for training||Training time|
|To connect a Google account, follow the instructions here.||The price of training a Google model depends on the amount of training data. The minimum is approximately $100 and the maximum per training is fixed at $300.||1,000||3-6 hours|
|ModernMT||To connect a ModernMT account, follow the instructions here.||free||no limitations||seconds|
Create a project
Go to Intento MT Studio. Select MT Studio, then select Create project. Enter a name for your new project and the source and target languages.
Under Dataset, select I want to train custom models.
Upload files with parallel data.
Choose a provider in the Provider field.
Choose an account in the Connected account field. If you're having trouble with accounts, contact your organization's admin or Intento support.
To train a model from another provider, select Add provider and choose another provider and connected account.
Select Create project.
Translate with models
After you start training, MT Studio shows the status of your models: Completed if the model is ready or In progress if it's still training. If training has failed, you'll see Error and a redo arrow — select the arrow to start training again.
When the model is ready, you'll see its Model ID and Model key, which you need to translate with this model.
To evaluate the new custom models, translate a test set of text segments. The test set is a random selection of segments from the training data that we removed before training, so the model has not seen them.
Select Translate with models. Translating takes a few minutes: the more models you have, the longer.
When translations are ready, you'll see Translations completed. Select OK, clear.
To see the translations, select the Files tab and download the test file: it has the source test segments, their reference translations, and all machine translations.
To evaluate translation quality, compute MT quality metrics.
Select Score Metrics. In MT Studio, you can compute five metrics: COMET, BERTScore, hLEPOR, TER, and BLEU. Learn more about MT quality metrics here.
Select metrics to compute, then select Start scoring.
MT Studio will compute quality scores for all models' translations. This takes about a minute for hLEPOR, BLEU, and TER, and several minutes for COMET and BERTScore. The more models you have, the longer scoring takes.
Note on COMET and BERTScore: before computing these metrics, MT Studio makes the machine translations and the reference translation all-lowercase. Thus, errors in capitalization will not lead to a lower score.
When scoring is complete, you'll see the scores for all models. Models are sorted from highest-scoring to lowest-scoring by the first metric you've chosen. To sort models by another metric, select the metric name. Note: higher translation quality means higher hLEPOR, BLEU, COMET, and BERTScore, but lower TER.
For hLEPOR, COMET, and BERTScore, MT Studio computes 83% confidence intervals. You can see them right under the corpus scores.
Select Show charts to see visualizations of the corpus scores for each model.
How to read the charts
Scoring charts make it easy to tell apart translations by stock MT models and custom models: custom models are light blue and stock models are dark blue.
The height of the bar shows the corpus score for the whole test set. The black ticks on the bars are 83% confidence intervals.
BLEU and TER are corpus metrics, so there are no confidence intervals on the BLEU and TER charts.
Select Download results to download a report with scores. The report includes:
- a spreadsheet with corpus scores
- bar charts with model ranking
- a spreadsheet with segment scores (not available for some plans)
In MT Studio, you can compare two translations in detail, for example translations by a stock MT model and its custom version.
To start an in-depth analysis, select two models, then select Analyze and choose a metric for analysis. Our MT customization analysis tool will open. See more about MT customization analysis here.
All analyses of pairs of models will be saved in the Analysis history tab in your project.
Delete a project
To delete a project, select the button in the top right, then select Delete project.