Humanity Centered Data | Quality of Life Data for All People

How We Use AI to Synthesize UNHCR, OCHA, and ACLED Data — Without Losing the Human Layer

By the Humanity Centered Data Editorial TeamPublished June 18, 2026

June 18, 20268 min read

Why we are publishing this

The single most common question we get from researchers, journalists, and aid workers is some variant of "how do you actually use AI on this site?" This page is the answer. It describes the pipeline, the controls, the failure modes we have hit, and what stays out of the pipeline on purpose.

The data spine

Every published number on Humanity Centered Data traces back to a primary-source dataset. The spine has four pillars.

UNHCR Refugee Data Finder and the operational portal for refugee, asylum-seeker, and statelessness statistics.
OCHA Humanitarian Data Exchange (HDX) for sectoral indicators, response plans, and population-in-need figures.
ACLED for geocoded conflict events.
IDMC GRID and GIDD for internal displacement stock and flow.

Supporting sources (IOM DTM, ReliefWeb, JRC, FAO, WHO, World Bank) feed specific thematic pages. The sourceFreshness and sourceAttribution layers in the codebase enforce the rule that every chart and every claim links to its primary source.

What the AI pipeline does, and does not, do

The pipeline does five things.

1. Document ingestion. PDFs, CSVs, and API responses from the spine sources are converted to text and structured tables, with provenance metadata attached at the record level. 2. Retrieval. When a country page or thematic page is being assembled, a vector and lexical hybrid retriever pulls the relevant primary-source passages into the context window. The retrieval set is scoped tightly; we do not let the model average across the whole corpus. 3. Drafting. A frontier LLM produces narrative text grounded in the retrieved passages, with inline references to the source IDs. 4. Verification. A second pass checks every numerical claim against the retrieved source and flags any number that does not match. Unmatched numbers are stripped, not paraphrased. 5. Human review. Every page is reviewed by a human editor before publication. Drafts with unresolved verification flags are returned, not published.

The pipeline does not do five things.

It does not generate quotes or first-person testimony. Per our content policy, the site never fabricates stories or anecdotes about named individuals.
It does not produce predictions or forecasts that lack a published primary-source basis.
It does not write recommendations for specific governments or agencies.
It does not auto-publish. There is no scheduled job that pushes AI-drafted content to readers without human sign-off.
It does not retain user prompts as training data for any third-party model.

Where the human layer lives

The phrase "human in the loop" is used loosely in the industry, so it is worth being specific about where the humans actually sit.

Source selection is human-curated. We choose which datasets enter the spine.
Retrieval scope is human-curated. Each page template defines which source set its retrievals are allowed to draw from.
Verification thresholds are human-set. Numerical claims must match the primary source to defined tolerances; mismatches block publication.
Editorial review is human-performed. Editors check for tone, accuracy, and the more subtle failure mode where a model cites a source accurately but draws a conclusion the source does not support.
Reader-visible corrections are human-issued. When we find errors after publication, the correction goes on the page with a date.

Failure modes we have hit

Publishing this honestly means saying what has gone wrong.

Plausible misattribution. A draft once attributed a 2023 conflict-event total to ACLED that was actually a sum of two non-comparable categories. The verification pass missed it because the digits matched a real cell in the source; the editor caught it. We tightened the verifier to check column semantics, not just values.
Stale-source confidence. Early drafts sometimes cited UNHCR figures that had been superseded by a newer mid-year release. We now require the freshness layer to reject any source older than the most recent publication for that country-year.
Cross-source averaging. When two sources disagreed (a common case), early drafts split the difference. We now require the draft to surface the disagreement explicitly and cite both sources, rather than producing a synthetic midpoint.
Tone drift on protection-sensitive topics. We monitor for it; we re-edit when we catch it.

Why this matters for trust

The humanitarian data ecosystem runs on chains of citation that can be broken silently by AI-assisted publishing at scale. Our position is that AI assistance is fine, opacity is not. Every page tells readers what data it draws from. Every chart links to the source. Every number is verifiable in under one click. None of that is incompatible with using AI to draft; all of it is incompatible with publishing AI drafts unread.

If a reader can trace any claim on the site back to a primary source in under sixty seconds, the pipeline is working. If they cannot, we want to know.

Sources and further reading

UNHCR Refugee Data Finder: https://www.unhcr.org/refugee-statistics/
OCHA HDX: https://data.humdata.org/
ACLED: https://acleddata.com/
IDMC GIDD: https://www.internal-displacement.org/database/
IASC Operational Guidance on Data Responsibility: https://interagencystandingcommittee.org/

We Value Your Privacy

How We Use AI to Synthesize UNHCR, OCHA, and ACLED Data — Without Losing the Human Layer

Why we are publishing this

The data spine

What the AI pipeline does, and does not, do

Where the human layer lives

Failure modes we have hit

Why this matters for trust

Sources and further reading

Can Large Language Models Understand Humanitarian Data? We Tested It

AI vs Traditional Methods: How Humanitarian Organizations Are Counting Displaced People in 2026

How to Use AI to Analyze UNHCR and OCHA Datasets (2026)

How AI Is Transforming UN Humanitarian Response in 2026

AI vs. UNHCR: Who Gets the Numbers Right on Global Displacement?

How Does UNHCR Count Refugees in 2026? The Methodology Explained

How We Use AI to Synthesize UNHCR, OCHA, and ACLED Data — Without Losing the Human Layer

Why we are publishing this

The data spine

What the AI pipeline does, and does not, do

Where the human layer lives

Failure modes we have hit

Why this matters for trust

Sources and further reading

Keep reading

Can Large Language Models Understand Humanitarian Data? We Tested It

AI vs Traditional Methods: How Humanitarian Organizations Are Counting Displaced People in 2026

How to Use AI to Analyze UNHCR and OCHA Datasets (2026)

How AI Is Transforming UN Humanitarian Response in 2026

AI vs. UNHCR: Who Gets the Numbers Right on Global Displacement?

How Does UNHCR Count Refugees in 2026? The Methodology Explained