Entity structure

OpenSanctions uses the FollowTheMoney data model to structure its entity graph. Below is an explanation of how entities and their relationships are designed.

FollowTheMoney (FtM) defines a grapg data model for storing information about data relevant to anti money-laundering analysis. You will need to understand three concepts: entities, entity references, and entity streams.

Entities

Entities are often expressed as JSON objects, with three key fields: a unique id, a specification of the type of the entity called schema, and a set of properties. All properties are multi-valued and their values are strings (or nested objects in API responses).

{
    "id": "1b38214f88d139897bbd13eabde464043d84bbf9",
    "schema": "Person",
    "properties": {
        "name": ["John Doe"],
        "nationality": ["us", "au"],
        "birthDate": ["1982"]
    }
}

What properties can be set for an entity is determined by it's schema. For example, a Person has a nationality, while a Company allows for setting a jurisdiction. Both properties, however, have the same property type, country. You can see a full listing of the available schemata and their properties in the data dictionary.

Metadata attributes

OpenSanctions expands this basic structure with some additional metadata attributes at the base of the entity data object:

  • datasets lists all data sources from which specific facts (i.e. property values) are included in the entity. See the glossary.
  • referents is a list of all the source entity IDs which have been grouped into a merged entity. See identifiers and deduplication for important context about this.
  • caption is a pre-selected display name for the entity. It's one of the names described in properties, picked by having few differences with all the other names. This is strongly biased towards picking names written in latin script, whenever a latin name is available.
  • target a legacy field (see the glossary, and use topics instead).

Timestamp attributes

Entities also contain a number of timestamps at the root of the entity object:

  • last_change - The last time some value in the entity changed - either because it changed at the source, or because our data cleaning was updated resulting in a generally subtle change in value.
  • first_seen - The first time an entity was included in the dataset - either the date the dataset was first created, or the date the entity showed up in the source.
  • last_seen - When the dataset was last re-generated. This is almost never what you need. It is maintained for backward compatibility.

These values are partial ISO 8601 dates with precision to the second, in UTC. They represent changes we observe as a result of our data processing. The last_change timestamp is available at a dataset-level in the dataset metadata.

Some entity types include their own dates if available from the data source, e.g. the Sanction listingDate, startDate and endDate properties. Use these in preference to the system timestamps in the entity root.

References

Entities can reference other entities. This is achieved via a special property type, entity. Properties of this type simply store the ID of another entity. For example, a Passport entity can be linked to a Person entity via its holder property:

{
    "id": "passport-entity-id",
    "schema": "Passport",
    "properties": {
        "holder": ["person-entity-id"],
        "number": ["CJ 7261817"]
    }
}

Interstitial entities

A link between two entities will have its own attributes. For example, an investigator looking at a person that owns a company might want to know when that interest was acquired, and also what percentage of shares the person holds.

This is addressed by making interstitial entities. In the example above, an Ownership entity would be created, with references to the person as its owner property and to the company as its asset property. That entity can then define further properties, including startDate and percentage:

{
    "id": "ownership-entity-id",
    "schema": "Ownership",
    "properties": {
        "owner": ["person-entity-id"],
        "asset": ["company-entity-id"],
        "startDate": ["2020-01-01"],
        "percentage": ["51%"],
    }
}

Note: It is tempting to simplify this model by assuming that entities derived from Thing are node entities, and those derived from Interval are edges. This assumption is false and will lead to nasty bugs in your code.

Streams

Many tools in the FtM ecosystem use streams of entities to transfer or store information. Entity streams are simply sequences of entity objects that have been serialised to JSON as single lines without any indentation, each entity separated by a newline.

Entity streams are read and produced by virtually every part of the FollowTheMoney command-line, OpenSanctions, and the Aleph platform. When stored to disk as a file, the extensions .ftm or .ijson should be used.

Tools for working with FtM data

Some further documentation regarding FtM tooling:

Got more questions? Join the Slack chat to ask questions and get support. Or contact us directly to get in touch with our team.