Important Concepts

OpenSanctions uses a specific set of terms to describe both the technical and domain concepts that structure our understanding of compliance data.

Entities

Entities are the data atoms of OpenSanctions. They can describe real things - like a person, company, passport or airplane - or somewhat abstract notions like a sanctions designation. Each entity has a set of properties, like a name, date of birth, or tax identifier.

Entities are organized into a network graph that describes the connections which exist between them. Our data model describes some types of relationships as entities which in turn have their own properties.

Datasets

Datasets are the groupings in which entities are organized. Every entity is at least part of one dataset, which describes the data source from which it was retrieved (e.g. the main US sanctions list). We provide metadata for each dataset, describing its publisher, data coverage and the limitations inherent in the source. You can download bulk exports in standardized data formats for each dataset.

OpenSanctions combines entities from multiple sources into collections, described below.

PS. Did you know that the W3C defines a dataset as a "set of data"? Computer science is thrilling stuff, people.

Collections

A collection is a bundle of entities (people, companies, etc.) from multiple data sources. These sources might have a similar topical focus (e.g. international sanctions, or procurement bans). Data users can pick a suitable collection when integrating data into a downstream application or screening entities using the API.

The main collection produced by OpenSanctions (and available via our API) is called default.

Entity of interest

OpenSanctions generates and publishes a comprehensive database of risk-linked entities. The term "entity of interest" is used to describe companies or people that are not directly subject to sanctions but are included in OpenSanctions for other reasons. It does carry a specific legal meaning, and does not imply wrongdoing.

Having grown from our core focus of aggregating sanctions data, OpenSanctions also includes politically exposed persons, companies and people included in debarment databases, criminal watchlists, and other domain datasets related to corruption, money laundering and other forms of financial crime. We use data enrichment to find companies linked to sanctioned entities, which may be subject to secondary sanctions.

If you are uncertain why an entity is included, we recommend a) looking at the "Relationships" section of the entity profile to see if it is linked to a primary risk source, and b) checking the metadata description of the data sources (linked at the bottom of the entity profile) to understand the rationale for its composition.

More information:

Targets

The data sources included in the database usually designate specific entities of interest, e.g. the target of a financial sanction, criminal penalty or a political officeholder. In addition to those target entities, some lists include contextual entities, e.g. a set of companies that a sanctioned entity is linked to, or the relatives of a politician. Each entity in the output data includes a boolean field, target, which enables you to distinguish between targeted and secondary entities.

It is not recommended to rely on the target flag to decide what entities to include e.g. in a screening process. Use risk topics instead.

Caption

Data related to a single entity is often aggregated from multiple sources. Those may describe the same person or company using multiple names, such as Emmanuel Macron, MACRON, Emmanuel or Эмманюэль Макрон. While the full list of available names is provided as part of the entity data, you can also use the caption field, which is an algorithmically-picked, preferred display name for the entity. The following criteria are applied when picking a display name:

  • Names written in the Latin alphabet are preferred over names in other scripts.
  • If multiple sources identify the same name, that name is given preference.
  • A title-cased name (Emmanuel Macron) is picked over upper-cased names (Emmanuel MACRON)

Failing all these strategies, the name with the lowest combined edit distance to all other names is chosen.