On this page

A Study on the Improvement of Cross-domain Generalization Ability of Neural Machine Translation Based on Self-supervised Learning

By: Jing Li1
1Doctoral Students, Tian Jin Foreign Studies University, Tianjin, 300204, China

Abstract

Aiming at the problem of insufficient generalization ability of Neural Machine Translation (NMT) in crossdomain scenarios, this study proposes an LSTM-RNNs-attention model that integrates self-supervised multimodal features with an improved attention mechanism. Through multimodal self-supervised learning and text preprocessing optimization, the model constructs a graphic-text consistency classification and keyword annotation algorithm from image-text semantic correlation, which combined with the LSTM-CRF sequential word segmentation technique significantly improves the accuracy of the source language semantic representation. The experimental results show that the model F1 value reaches 99.13% when the character vector dimension is 125, and the performance is optimal when the Dropout ratio is 30% in the Chinese word segmentation task. For input noise robustness, the UNK-Tag strategy in the random word dropout mechanism has a BLEU value of 47.63 at a sampling probability of 0.15, which is 3.81% higher than the baseline. In the multilingual translation task, the BLEU scores of LSTM-RNNs-attention model on English-Chinese (Eng-Ch), Japanese-Chinese (Jap-Ch), and German-Chinese (Ger-Ch) are 45.82, 42.32, and 32.91, respectively, compared with the mainstream baseline model BERT-fused NMT, Multilingual NMT by an average of 2.6-18.0 points, and the convergence time is shortened to 6.58s (Eng-Ch), which is significantly better than the efficiency of Transformer’s 16.13s and RNN-NMT’s 11.79s.Manual evaluation further validates the model’s semantic coherence advantage, with the Eng-Ch task scoring 9.66 points ( out of 10). The study effectively solves the problem of semantic bias and long-distance dependency in cross-domain translation through self-supervised multimodal feature fusion, dynamic attention weight allocation and word segmentation optimization.