Inside the OpenSanctions database, information from different sources is stored in a statement-based data model which lets us trace the origin and temporal range of each attribute of any tracked entity.
This page covers an advanced topic: most users will not need to interact with the statement-based data. See also: export formats guide, entity data model.
The OpenSanctions database is designed to meet the following design objectives:
In order to meet these goals, the system uses a statement-based database design. To illustrate this, think of a claim like this one: the US sanctions list, as of the most recent update, claims that entity ofac-12345
has the property name
set to the value John Doe
.
All information in the system is stored using this way, including links between different entites. When exports are generated, the system will do so by grouping all the statements that pertain to a particular entity and combining them into the desired format. The forms of the data we publish, including the JSON format returned by the API, are simplified/aggregated for easier use. If you're trying to use the data in a way that does not require granular provenance, use these more common export formats.
As a database schema, this results in a table with the following columns:
Column | Type | Description |
---|---|---|
entity_id | ID(255) | (source ID): the entity identifier as derived from the data source, e.g. ofac-nnnn for entries on the US OFAC sanctions lists. |
prop | string(255) | (property): the entity attribute that this statement relates to, e.g. birthDate , or name (see data dictionary). |
prop_type | string(255) | (property type): the data type of the given property, e.g. date , country , name etc. |
value | string(65535) | Actual value of the property for the entity. If multiple values are indicated in the source data, each of them will result in a separate statement. |
lang | string(3) | Language (3 letter code) of the value, if it is known. |
original_value | string(65535) | Property value before it was cleaned (e.g. country name vs. code, unparsed date). |
dataset | string(255) | Source dataset identifier (same as the dataset URL slugs). |
schema | string(255) | Type of the given entity. Specific statements can indicate more or less specific schemata, e.g. LegalEntity and Company (the resulting entity would be a Company ) - see data dictionary. |
first_seen | iso_ts | First date when the processing pipeline found this value linked to the given entity. Please note that this only records values after July 2021, when we started tracking the data - more realistic evidence of when an entity was added to the given sanctions list can be found in the createdAt property. |
last_seen | iso_ts | Latest date when the processing pipeline found this value. |
target | boolean | The entity to which this statement belong is a target - a designated sanctioned entity - in the data source. |
canonical_id | ID(255) | Entity identifier as resulting from our de-duplication process. If no de-duplication has been performed, this is the same as entity_id . If the entity has been merged with others inside the same database, it will receive an ID starting with NK- (short for nomenklatura). If the entity has been de-referenced against Wikidata, the Wikidata item ID (like Q7747 ) will override any NK- IDs. |
On the web site: If you'd like to play with how this looks in practice, you can use the raw data explorer to browse and filter the statement data. You can add extra query parameters like schema
, entity_id
, canonical_id
, prop
, etc. in your browser's URL bar.
Bulk data access: The statements table is exported to CSV format every night as part of the regular data pipeline runs. You can fetch the data here:
This file is subject to non-commercial licensing. You can import this CSV into a local install of the OpenSanctions codebase using the opensanctions import-statements [file]
command.
OpenSanctions is free for non-commercial users. Businesses must acquire a data license to use the dataset.