The databases has various filtering mechanisms that expose technical and legal criteria which can be used to limit to scope of a query. Filtering occurs at two levels: the filtering of data sources, and the filtering of classes of entities described within those data sources. Keep in mind that a single entity can be sourced from multiple data sources.
- Entity types (jargon: Schema) describe categories of logical entities:
Person, Company, Vessel, CryptoWallet. If you are screening a list that includes both organizations and natural persons, use LegalEntity – it's an umbrella term for both. Schemata are explained in the data dictionary. - Collections are scopes that combine different data sources with similar meaning. The whole database is contained in the
default collection, while sanctions is a subset of sources limited to government-issued sanctions lists. Inside of that, us_sanctions limits the scope to only US (federal) watchlists, and eu_sanctions combines EU and member state watchlists. Additional collections are listed here. - Specific data sources (eg.
us_ofac_sdn) can also be a filter. For example, you may wish to query all sanctions lists, except those published by China (cn_sanctions) and Russia (ru_mfa_sanctions). - Risk topics are a taxonomy of the risk factors that apply to specific entities. A person can be
role.pep, sanction, a company might be sanction.linked, reg.warn etc.- Topics identify a category of risk, but not its origin.
Sanction entities close that gap: they're linked to companies and people, and detail the name of the sanctioning authority, the reason, time span, and measures imposed.
- Sanctions programs detail individual policy instruments under which an entity was designated. For example, the US sanctions list includes companies and people sanctioned under 30+ programs, most them linked to a specific geopolitical conflict (eg. Ukraine) or topical focus (eg. Cyber warfare).
In Practice#
Using the default collection endpoint (/match/default) is a good place to start. Pick relevant topics (eg. sanction, sanction.linked for sanctions screening, add role.pep, role.rca and reg.action for basic AML checks), and run some experiments.
Then, use the include_dataset and exclude_dataset parameters to either pick a custom set of sources, or exclude sources that don't have regulatory relevance and produce false positives. Use the include_dataset argument to pick only a select set of datasets: /match/default?include_dataset=us_ofac_sdn&include_dataset=us_ofac_cons, and use exclude_dataset to filter a specific dataset from a collection query like this: /match/default?exclude_dataset=iq_aml_list.
Avoid using the peps collection, instead filter for the relevant topics (role.pep, role.rca, and poi) and consider implementing country filters.
On-premise: Using a manifest to create custom collections#
When using the on-premise version of yente, you can also use the custom datasets function to define custom collections. To do this by adding a manifest file like this:
catalogs:
- url: "https://data.opensanctions.org/datasets/latest/index.json"
scope: sanctions
resource_name: entities.ftm.json
datasets:
- name: europe
title: European datasets
datasets:
- eu_fsf
- eu_travel_bans
- eu_sanctions_map
- be_fod_sanctions
- fr_tresor_gels_avoir
This will create a new dataset collection named europe, which can be used in query endpoints, e.g. /match/europe and /search/europe.
Caveats#
- Entity profiles returned from the API will always include attribute values from all data sources in the database. Detailed explanation.
- In rare cases, sanctions data sources list secondary entities which are not sanctioned. Hence, it’s possible for an entity to feature a sanctions dataset as a source, but not be tagged with the
sanction topic.