ChatGPT Outperforms Google Translate and UD Talk in Chinese-Japanese Medical Translations
Researchers at Fu Jen Catholic University Hospital recorded 20 cardiology and pulmonology outpatient visits between December 2024 and November 2025. Each encounter was audio‑recorded, transcribed verbatim, anonymised, and fed into ChatGPT, Google Translate, and the subtitle app UD Talk. Eight professional medical interpreters scored the accuracy of selected exchanges on a 6‑point Likert scale, while 14 Japanese‑speaking lay participants rated overall satisfaction.
The results were striking. ChatGPT achieved a median accuracy and satisfaction score of 5.0 (IQR 4.0‑5.0) in both specialties, whereas Google Translate and UD Talk averaged 2.0 (IQR 1.0‑3.0). The difference was statistically significant (P < 0.001). A similarity analysis revealed that Google Translate and UD Talk produced identical translations in 87 % of exchanges, but ChatGPT’s outputs matched only 5 % of the time.
The study attributes ChatGPT’s edge to its context‑aware generation, built on the GPT‑4o model, which can parse colloquial speech, incomplete sentences, and specialised medical terminology. In contrast, Google’s neural‑machine engine and UD Talk’s speech‑recognition pipeline tend toward literal, sentence‑level translation. One illustrative error involved the Chinese term “腎指數”; Google Translate and UD Talk rendered it as “kidney‑index finger,” while ChatGPT correctly translated it as “renal function value.”
Despite its superior performance, the authors caution that ChatGPT should not replace professional interpreters. Human‑trained interpreters deliver not only accurate translation but also cultural mediation and clarification—functions that current AI systems cannot fully replicate.
The study notes several limitations. The language pair is narrow (Chinese‑Japanese), the sample size is small (20 visits), and a single physician per specialty was involved, all of which may affect generalisability. The analysis relied solely on GPT‑4o; newer or alternative LLMs may perform differently.
The findings suggest that AI‑assisted translation could serve as a supplementary aid in outpatient settings where interpreter services are scarce, especially for brief interactions or preliminary triage. However, the authors recommend continued human oversight to safeguard patient safety and communication quality.
Future research should broaden the scope to other language pairs, larger datasets, and diverse clinical contexts to determine whether ChatGPT’s advantages persist across settings.
In sum, the study provides evidence that ChatGPT can outperform conventional machine‑translation tools in translating Chinese‑Japanese medical dialogue, but it underscores the need for complementary use alongside professional interpreters to ensure accurate, culturally appropriate, and safe patient‑provider communication.