Navigated to Humanity Centered Data | UN Refugee & IDP Tracker
    All AI articles
    Impact

    AI Translation in Refugee Services: Where Machine Translation Helps, Where It Harms

    By the Humanity Centered Data Editorial Team
    June 19, 202610 min read

    Language access is the unglamorous gating constraint of refugee services

    Almost every problem in refugee response โ€” registration, protection screening, legal counselling, health intake, child safeguarding โ€” bottlenecks on language. The pool of qualified interpreters in any given language is small and unevenly distributed, and demand spikes in days rather than years. AI machine translation has, since the rollout of transformer architectures in 2017 and especially since Meta's No Language Left Behind (NLLB) project in 2022, made it possible to provide some form of language access in roughly 200 languages within seconds. Organisations including Tarjimly and CLEAR Global / Translators without Borders have built operational workflows around this capability.

    Where the technology is genuinely improving outcomes

    Three contexts have meaningful evidence. Asynchronous text translation of public-information materials, where the translated draft is reviewed by a human speaker before distribution, expands the language reach of the same editorial team severalfold. Real-time chat translation for protection helplines lets a single advisor serve callers in languages they do not speak, with measurable reductions in wait time. Document triage in asylum casework, where machine translation summarises the volume of evidence to help a caseworker prioritise, is increasingly standard in European reception systems. All three uses share a property: the human stays in the loop on the high-stakes output.

    Where it does measurable harm

    The harms cluster around three uses. Unmediated asylum interviews: published audits of US and UK pilots have found error rates that materially affect credibility assessments in low-resource languages. Medical intake without a clinician who speaks the language: machine translation regularly mistranslates symptom descriptions in Tigrinya, Rohingya, and Pashto in ways that a clinician cannot detect. Legal advice: translated legal terms of art (asylum, parole, refoulement) have no stable equivalent in many target languages, and the model will produce a confident but wrong rendering.

    The language coverage gap is real and uneven

    Coverage in NLLB and its successors is wide but shallow. Quality is excellent for high-resource languages (Arabic MSA, French, Spanish), moderate for major regional languages (Swahili, Hausa, Bengali), and poor for languages spoken by some of the most displaced populations (Tigrinya, Rohingya, Dari dialects, Anuak, several Sudanese Arabic varieties). The FLORES-200 benchmark provides per-language quality scores; reading them before deploying in a new context is the minimum due diligence.

    Protection and consent

    Machine translation pipelines often involve cloud APIs that retain inputs for model improvement by default. For protection-sensitive content (asylum testimony, gender-based violence disclosures, medical records) this is incompatible with humanitarian data principles. The operative standards are the IASC data responsibility guidance and the ICRC Handbook on Data Protection in Humanitarian Action. Practical mitigations: on-device or self-hosted models for sensitive content, contractual no-retention agreements with vendors, and explicit informed consent in the client's language of preference.

    A short procurement checklist

    • Per-language quality on FLORES-200 for the languages you will actually serve.
    • Data residency and retention guarantees in writing.
    • Human-in-the-loop workflow for any high-stakes output.
    • Fallback to qualified human interpreters for protection-sensitive interviews.
    • Logging and audit trail for every translated interaction touching a case file.

    Further reading and primary sources

    • NLLB: https://ai.meta.com/research/no-language-left-behind/
    • FLORES-200: https://github.com/facebookresearch/flores
    • Tarjimly: https://www.tarjimly.org/
    • CLEAR Global: https://clearglobal.org/
    • ICRC Handbook on Data Protection: https://www.icrc.org/en/data-protection-humanitarian-action-handbook
    • IASC data responsibility guidance: https://interagencystandingcommittee.org/
    Advertisement
    Advertisement