Statement data model

Inside the OpenSanctions database, information from different sources is stored in a statement-based data model which lets us trace the origin and temporal range of each attribute of any tracked entity.

This page covers an advanced topic: most users will not need to interact with the statement-based data. See also: export formats guide, entity data model.

The OpenSanctions database is designed to meet the following design objectives:

Be able to dynamically merge and un-merge entities from a broad range of data sources.
Be able to identify the origin of each piece of information about a sanctions target or other entity.
Track entities and their properties as they change over time.

In order to meet these goals, the system uses a statement-based database design. To illustrate this, think of a claim like this one: the US sanctions list, as of the most recent update, claims that entity ofac-12345 has the property name set to the value John Doe.

All information in the system is stored using this way, including links between different entites. When exports are generated, the system will do so by grouping all the statements that pertain to a particular entity and combining them into the desired format. The forms of the data we publish, including the JSON format returned by the API, are simplified/aggregated for easier use. If you're trying to use the data in a way that does not require granular provenance, use these more common export formats.

Model implementation

As a database schema, this results in a table with the following columns:

Column	Type	Description
`entity_id`	ID(255)	(source ID): the entity identifier as derived from the data source, e.g. `ofac-nnnn` for entries on the US OFAC sanctions lists.
`prop`	string(255)	(property): the entity attribute that this statement relates to, e.g. `birthDate`, or `name` (see data dictionary).
`prop_type`	string(255)	(property type): the data type of the given property, e.g. `date`, `country`, `name` etc.
`value`	string(65535)	Actual value of the property for the entity. If multiple values are indicated in the source data, each of them will result in a separate statement.
`lang`	string(3)	Language (3 letter code) of the value, if it is known.
`original_value`	string(65535)	Property value before it was cleaned (e.g. country name vs. code, unparsed date).
`dataset`	string(255)	Source dataset identifier (same as the dataset URL slugs).
`schema`	string(255)	Type of the given entity. Specific statements can indicate more or less specific schemata, e.g. `LegalEntity` and `Company` (the resulting entity would be a `Company`) - see data dictionary.
`first_seen`	iso_ts	First date when the processing pipeline found this value linked to the given entity. Please note that this only records values after July 2021, when we started tracking the data - more realistic evidence of when an entity was added to the given sanctions list can be found in the `createdAt` property.
`last_seen`	iso_ts	Latest date when the processing pipeline found this value.
`target`	boolean	The entity to which this statement belong is a target - a designated sanctioned entity - in the data source.
`canonical_id`	ID(255)	Entity identifier as resulting from our de-duplication process. If no de-duplication has been performed, this is the same as `entity_id`. If the entity has been merged with others inside the same database, it will receive an ID starting with `NK-` (short for nomenklatura). If the entity has been de-referenced against Wikidata, the Wikidata item ID (like `Q7747`) will override any `NK-` IDs.

Accessing the statements data

On the web site: If you'd like to play with how this looks in practice, you can use the raw data explorer to browse and filter the statement data. You can add extra query parameters like schema, entity_id, canonical_id, prop, etc. in your browser's URL bar.

Bulk data access: The statements table is exported to CSV format every night as part of the regular data pipeline runs. You can fetch the data here:

statements.csv

This file is subject to non-commercial licensing. You can import this CSV into a local install of the OpenSanctions codebase using the opensanctions import-statements [file] command.

Got more questions? Our support is here to help. You can also join the Slack chat to meet the community.

FAQ Understanding the data

Entity structure

Data dictionary

Identifiers and de-duplication

Matcher training data

Statement-based data

Data enrichment