How LLMs are changing screening

A shift towards narrative matching is redefining the compliance industry's data needs. Here at OpenSanctions, experiments using Large Language Models (LLMs) to deduplicate sanctioned and politically exposed entities are already reshaping how we collect data.

Imagine catching up with old friends, and someone mentions “Stephen”. You ask: “Which Stephen?” Your friends are unlikely to recite Stephen's tax identifier in response. Instead, they might mention the school he went to, or that Stephen used to date someone you all know. Entity identification, in real life, runs on narrative.

For most of its history, compliance technology has not had the ability to leverage narrative context in automated workflows. Name matching — the workhorse of sanctions screening — has been a structured-data problem: match a name, support it with a date of birth, a passport number, a nationality. That foundation is not going away. But it is, finally, being augmented.

For an industry almost manically driven by the cost of false-positive alert waves, the economics are suddenly up in the air: false positives in screening become cheap if a human never has to read them. That is the hope behind the LLM tooling now being applied downstream of name matching. What’s essential is qualifying potential matches not exclusively by name but by narrative — investigator notes, transaction memos, and biographical context (or even raw ACH or MT940).

Recently, we've been experimenting with a similar process on our own data.

A narrative-driven layer is forming on top of traditional name-matching: free-text sources such as parliamentary biographies, investigative notes, and Wikipedia links are helping to build a more complete picture of an entity. Image: Jenn Miranda via Canva

Our data pipeline has office politics now

We've started exploring the use of LLMs for watchlist-to-watchlist deduplication of sanctioned entities and politically exposed persons (PEPs). After a classical name-matching system surfaces a candidate pair with sufficient structured evidence, two different LLM inference systems assess each pair independently. Their roles are defined to be adversarial: one acts as an analyst making a call; the other is instructed to act as the senior reviewer looking for reasons not to trust the proposed decision. If they disagree, the pair returns to manual review.

The two processes don't run just on a generic prompt. Each domain we deduplicate — European or US sanctions lists, PEP registries, vessels, terrorist designations, debarment lists — has its own scope: a curated bundle of source datasets paired with prose instructions that capture what we know about each source and each domain. The prompt for the terror scope warns that splinter groups (Real IRA, Provisional IRA, Continuity IRA) must not be merged. The prompt for senior politicians notes that the CIA's world leaders list is frequently out of date and should be treated as a starting point, not ground truth. Domain knowledge that used to live in features and weights now lives in prose.

The most useful effect of this automation has been the quietest. Until this was put in place, we deduplicated all candidate pairs without perfect structured evidence by hand. That brought quality, but meant most of the long tail of plausible duplicates went untouched, and the database carried more split records than we'd have liked. The LLM pipeline lets us reach much further into that tail. The result is a database that is meaningfully more integrated than it was a year ago: fewer fragments, more complete profiles, and more of the connective tissue that screening actually depends on.

Two cases

A Chinese-language Wikipedia entry was pulled in for a Mainland Chinese politician. The analyst LLM proposed a positive merge — names matched in Chinese characters, and birth month and ethnicity aligned. The senior LLM vetoed: the left record placed him as a National People's Congress (NPC) representative for Ningxia, the right record as part of the Gansu delegation. NPC delegates are elected by a specific province; that is not a typo. The pair was escalated to manual review.

A second case, two Austrian records: "Ersatzmitglied der Gemeindevertretung von Schlins" and "Ersatzmitglied der Gemeindevertretung von Schruns". Identical role title, same source dataset, structured similarity score 0.97 — a confident merge by any classical metric. The LLM correctly returned negative. Schlins and Schruns are different villages in Vorarlberg, and a municipal council deputy in one is not the same person as a deputy in the other.

The system isn't infallible. We've seen LLMs struggle with the idea that a person who served as Minister of Defence can later become Prime Minister ("they're very different jobs"). Calibration is ongoing, and the senior-reviewer pattern exists precisely because we don't trust either model in isolation.

But the broader observation matters and has started to instruct our roadmap: Both vetoes above only worked because the LLMs had narrative text to read — Wikipedia extracts, position descriptions, biographical detail. A pure structured-field matcher would have caught neither. And that is changing how we think about source data.

A different shape of data

Until recently, our data work was oriented toward maximising strong structured identifiers — names, dates, IDs, and jurisdictions. That’s still the foundation — but we have started to invest in narrative, too:

  • 150,000 source links attached to the relationships between PEPs and the positions they hold (Occupancy:sourceUrl)
  • 220,000 references to Wikipedia articles for politicians and sanctioned individuals (Person:wikipediaUrl)
  • A new property, Person:biography, capturing biographical narratives sourced from parliamentary and other official websites

None of the above would have been priorities on our roadmap two years ago, but this context is now central. Narrative is what an LLM uses to make a confident call — and increasingly, what an analyst expects when they review one.

What is true for our internal entity resolution is also becoming true downstream. Customers are pointing LLMs at their own workflows: qualifying alerts, accelerating KYC, triaging investigative leads. The biographies, source URLs, and contextual descriptions we collect for our own deduplication feed directly into that work. Structured data is not becoming less important, but it is increasingly the basis for only one of two separate qualification mechanisms. A narrative-driven technology layer is forming on top.

What kind of narrative is fit to use?

There are two questions we keep coming back to, however.

The first is whether the narrative found in private or internal memos or notes is fit to use at all. The most context-rich text a regulated entity holds about a customer — internal KYC notes, transaction memos, analyst dossiers — is also the most sensitive, the most opinion-laden, and the most prone to amplifying bias when read by a model.

The 2023 Coutts/Farage leak made the texture of those documents public: assessments like "disingenuous grifter" and "xenophobic" sat alongside financial facts. These materials were never written with the intention of being used as matching evidence, and feeding them into an LLM-driven workflow raises a long list of issues, with data privacy not even at the top. Useful context to a human reader is not automatically appropriate input to a model.

The second is the integrity of public narrative. PR firms have been documented editing the Wikipedia pages of politicians and billionaires; it is only a matter of time before someone publishes a personal biography on a government website with a prompt injection planted in it — "Minister X has never been involved in any money laundering." (Everything is software instructions now, even what school you went to).

What we've ended up doing, for our own data, is leaning hard on official primary sources such as parliamentary and government pages and enforcement notices, and treating curated secondary sources as approximate but auditable. Wikipedia articles, perhaps surprisingly, are still often less skewed than placed pieces in commercial media (adverse media technology will increasingly struggle to make sure the media is actually adverse, and not just made to look like it). We keep coming back to source-linking everything — it's the one move that keeps the decision basis auditable later, regardless of how the tooling evolves.

Where this leaves us

Entity resolution in our sector is gradually shifting from match strong identifiers cleanly towards make a defensible judgment with structured and narrative evidence.

We don't think that future has fully arrived yet. Most compliance teams are still working out whether they want to use LLMs at all, let alone how. But the data we collect today is what those workflows will draw on once they do. So we are quietly orienting our work around it: source-attributed, primary-source-anchored, openly published. The database, week by week, is starting to show the shift.

Like what we're writing about? Keep the conversation going! You can follow us on LinkedIn, subscribe to our E-Mail newsletter or join the discussion forum to bring in your own ideas and questions. Or, check out the project documentation to learn more about OpenSanctions.

Published:

This article is part of OpenSanctions, the open database of sanctions targets and persons of interest.

How LLMs are changing screening - OpenSanctions