Frequently asked questions

#170:

How can I use the full KYB datasets?

Category: Bulk data · Last updated: · Permalink

The KYB collection of datasets includes full copies of several company registers and other international reference databases. These sources are crawled and exported to the FtM data format as a side product of data enrichment, but they can be used directly.

However, please make sure to read and understand the dataset description before you proceed to rely on this data. This will help to clarify that the usage of the KYB data is subject to the following constraints:

  • We use the term KYB as a way to summarize the genre of these datasets, but the geographic coverage of the datasets is not sufficient to build a global "know-your-business" service. If your use case requires a global database of company information, consider contacting a data vendor like OpenCorporates or Sayari.
  • The number of entities in the KYB collection is ca. 50x larger than that of the default OpenSanctions dataset. This means you will need to provide significant additional processing resources to use it. You also need to expect data updates in yente to take significantly longer to process, which may delay the availability of fresh sanctions/risk data in the same instance.

Importing the data into yente

In order to load the KYB data into yente, you can add an additional catalog to your instance's manifest (see also: custom datasets):

catalogs:
  - url: "https://delivery.opensanctions.com/datasets/latest/default/catalog.json"
    auth_token: "$OPENSANCTIONS_DELIVERY_TOKEN"
    scope: default
    resource_name: entities.ftm.json

  - url: "https://delivery.opensanctions.com/datasets/latest/kyb/catalog.json"
     auth_token: "$OPENSANCTIONS_DELIVERY_TOKEN"
     scopes:
       - ru_egrul
       - icij_offshoreleaks
       - gleif
    resource_name: entities.ftm.json
    namespace: true
datasets: []

Please note the use of namespace: true - this will add a dataset-specific suffix to the identifiers of all datasets loaded from the kyb catalog to make sure that their entity IDs do not overlap (and overwrite) the entities contained in the (de-duplicated default dataset).

Related questions

« Back to full FAQ index
How can I use the full KYB datasets? - OpenSanctions