Frequently asked questions

#6: If I run the published source code, will it rebuild the full database?

Category: Bulk data · Last updated: · Permalink

We publish the source code for the data processing stack used to build the OpenSanctions data. This means that anyone can build their own versions of the data. However, a lot of the value added to OpenSanctions comes from how we use these tools, rather than the tools themselves. For example:

  • You will need to build your own resolver data for entity deduplication between source datasets. We manually approve deduplication decisions to build out the dataset published here. This data constitutes a proprietary asset not included with the source code. When you conduct your own deduplication, different identifiers will be generated and you will not be able to generate links to using the data.
  • We use an enrichment process based around loading company registries into the yente service to build out the graph-adjacent context of sanctioned entities. Replicating that process on your own infrastructure is a fairly complex exercise.
  • Certain data sources (e.g. wd_peps, ru_rupep, sy_obsalytics) use non-open data to be built, and you will need to contact their publishers before including the data in your distribution, or exclude these sources.

When you're building your own version of the data instead of contributing to the project via a data license, please act as an open source contributor instead of a customer. Don't request real-time support via Slack. Contribute your own GitHub issues and patches if you notice any issues with the code base, and contribute documentation if you notice holes in it.

Related questions

« Back to full FAQ index