This technical documentation is intended to be read by Python developers who wish to run the OpenSanctions crawlers on their own infrastructure, or plan to add their own crawlers to the system.
This documentation is only relevant if you wish to run the OpenSanctions open source code. This will not be needed for the vast majority of data users.
Caveat: running the open source version of the data will not produce the same output data as the data on this website. This is caused by a variety of configuration issues, mainly related to deduplication and data enrichment. The deduplication mappings - which determine the entity IDs used in the OpenSanctions data - are a commercial asset of OpenSanctions.
The OpenSanctions pipeline handles the following key steps:
These steps are triggered using a command-line utility,
opensanctions, which can run parts of this process for specific segments of the data.
OpenSanctions is free for non-commercial users. Businesses must acquire a data license to use the dataset.