Are LLMs better for content translation than specialized models?

Question

LLMs can be prompted to translate content from one language to another. But how they compare to specialized models (that are often very small compared to LLMs)?

Alexander Wan · Accepted Answer · 2024-06-29 00:01:35Z

It's complicated. I'd recommend looking at results from recent WMT competitions. For example, the WMT 2023 findings state:

Large Language Models (LLMs) exhibit strong performance across the majority of language pairs, although this is based only on two LLM-based system submissions. Test suite analysis revealed that although GPT4 excelled in some areas (e.g. UGC translation) struggled with other aspects such as speaker gender translation and specific domains (e.g. legal), whereas it ranked lower than encoder-decoder systems when translating from English into less-represented languages (e.g. Czech and Russian)

Here, UGC means social/user-generated content.

Specifically, they test the GPT-4 system as described in Hendy et al. (2023) except instead of retrieving the most relevant few-shot samples, they use "fixed random translation examples" in addition to their "predefined few-shot examples". Another LLM-based system tested was Lan-BridgeMT although that tended to perform worse on average.

GPT4 performed among the best for e.g., English -> German:

and worse (but still ok) for English -> Czech:

.

Finally, they also talk about some interesting errors that GPT4 makes:

Additionally, test suites providers noted that GPT4 outputs are not always faithful to the source sentence (Bawden and Sagot, 2023) and that they have some issues with speaker gender translation (Savoldi et al., 2023) and specific domains (Mukherjee and Shrivastava, 2023, e.g. legal;).

As an aside, It's kind of confusing what they mean by "LLM" in their paper as, although they say: "we received only one submission using LLM methods (Lan-BridgeMT), whereas one dominant commercial LLM (GPT4) was included via our own efforts", most (all?) of the other submissions use large transformer models, including one with 10.5B parameters. Maybe they only include closed-source models/particularly large models? (Lan-BridgeMT uses GPT3.5/4)

Franck Dernoncourt · Accepted Answer · 2024-06-23 01:35:38Z

2

The PaLM 2 Technical Report reports that PaLM 2 outperforms Google Translate in some settings.

answered Jun 23 at 1:35

Franck Dernoncourt

2,7232 silver badges22 bronze badges

Add a comment |

justin Lake · Accepted Answer · 2024-06-28 08:42:36Z

0

LLMs offer versatility and broad context understanding, but specialized models typically provide higher accuracy and optimization for translation tasks. Use specialized models for precision and LLMs for broader context and adaptability

answered Jun 28 at 8:42

justin Lake

213 bronze badges

I don't see how this answers the question. It's not clear what "precision" means in this context, or what you mean by "broader context and adaptability". I encourage you to edit your answer to explain what you mean and provide specific definitions for those phrases. I hope I don't need to say this, but I want to bring the following policy to your attention: genai.stackexchange.com/help/gen-ai-policy
– D.W.
Commented Jul 1 at 19:16

Add a comment |

Stack Exchange Network

Are LLMs better for content translation than specialized models?

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
llm
machine-translation
or ask your own question.

Hot Network Questions

Are LLMs better for content translation than specialized models?

3 Answers 3

Not the answer you're looking for? Browse other questions tagged llmmachine-translation or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
llm
machine-translation
or ask your own question.