As a compliance officer, you’ll often be required to screen names against common international sanctions lists but also watchlists specific to your jurisdiction. Our publicly available dataset browser – a little tucked away on our website (and we’re working on this!) – is a handy tool for understanding the scope of the data available through OpenSanctions, where it comes from, and whether it’s right for your team. Below, we’ll walk you through the basics.
Our Maritime collection is one of eight collections in the OpenSanctions database, and contains over 127,068 entities. Photo: Adem Percem via Unsplash
Lists within lists
Our data is collected from open sources — mostly official lists published by government bodies and authorities, though most of our Politically Exposed Persons (PEP) data is collected via Wikidata. To ensure data provenance (the process of providing a historical record of exactly where each piece of data comes from), it’s important to us that wherever possible, our data is collected directly from primary sources rather than through third-party aggregators.
Our dataset browser lists the consolidated data sources we import and breaks down the specific sanctions and policy programs nested within each source. The publisher of the specific list is clearly labelled, along with the country, the number of listed entities, and the update frequency. This setup is designed to showcase the full scope of our dataset, so compliance teams know exactly what they’re getting, while also providing full transparency into the provenance of each source.
Lists within lists: using the browser, OpenSanction users can view specific programs listed within a data source
By clicking “expand programs per source” in the top right, you can view the specific programs and policies listed within each data source, as illustrated above. By clicking on a specific program, you can explore the measures imposed and look up an authority’s description of the program via its website.
In the dataset browser, you can also filter by “recent additions” to view the most recently uploaded datasets, and view data sources by collection (pictured below) — for instance, PEP data, maritime-related sanctions, or warrants and criminal entities.
A screenshot of our dataset browser, which you can filter via recently added datasets and programs per data source
A plus sign to the left of a source indicates an enrichment-based dataset, where data from a larger dataset is used to enrich the data within OpenSanctions with useful details and entity connections — for instance, the network links of a sanctioned entity that appeared in the ICIJ Offshore Leaks database, or persons or entities of interest from the open knowledge base, Wikidata.
In the dataset browser, clicking a specific data source opens a data overview that provides a breakdown of entity types, data timestamps, source URLs, additional context and other useful metadata. Where possible, we include a brief summary of the dataset to help users understand its origin and potential use.
Our data overview provides a detailed breakdown of each data source
Eliminating the noise
The hundreds of data sources in OpenSanctions are bundled into data collections. Not all screening processes will need to query the whole dataset, and picking your scope carefully is key to reducing false positive alerts.
When screening entities using the matching API, users can limit the results to entities appearing in a specific collection or data source. Applying query filters such as entity type can further be used to limit the scope of the query and improve precision, which we go into more detail about in our scoping documentation.
While our database has grown significantly over the last few years, our focus remains on data quality, rather than quantity. We add new data sources to our database when they are in the public interest, have a legal basis for inclusion, and provide detailed data beyond a name (see our data inclusion criteria here).
Our data roadmap is informed by gaps we identify, customer needs, and also data we find interesting for its investigative quirk — if you’ve got a source in mind that isn’t queued on the roadmap, or hasn’t already been suggested, you can suggest a data source via GitHub.
