Multi-view subword regularization

Multilingual pretrained representations rely on subword segmentation algorithms to create a shared multilingual vocabulary. However, these heuristic algorithms often lead to sub-optimal segmentation, especially for languages with limited amount of data. In this paper, we show that applying subword regularization techniques (Kudo, 2018; Provilkov et al., 2020) during fine-tuning boosts the cross-lingual transfer ability of multilingual representations. To take full advantage of different possible input segmentations, we propose Multi-view subword regularization, a method that enforces prediction consistency between inputs tokenized by the standard and subword regularized segmentation. We evaluate the zero-shot cross-lingual transfer of multilingual pretrained representations using our method on the XTREME multilingual benchmark (Hu et al., 2020) where our method brings consistent improvements of up to 2.5 points over state-of-the-art models.