How can I use the full KYB datasets?
Category: Bulk data · Last updated: · Permalink
The KYB collection of datasets includes full copies of several company registers and other international reference databases. These sources are crawled and exported to the FtM data format as a side product of data enrichment, but they can be used directly.
However, please make sure to read and understand the dataset description before you proceed to rely on this data. This will help to clarify that the usage of the KYB data is subject to the following constraints:
yente to take significantly longer to process, which may delay the availability of fresh sanctions/risk data in the same instance.yenteIn order to load the KYB data into yente, you can add an additional catalog to your instance's manifest (see also: custom datasets):
catalogs:
- url: "https://delivery.opensanctions.com/datasets/latest/default/catalog.json"
auth_token: "$OPENSANCTIONS_DELIVERY_TOKEN"
scope: default
resource_name: entities.ftm.json
- url: "https://delivery.opensanctions.com/datasets/latest/kyb/catalog.json"
auth_token: "$OPENSANCTIONS_DELIVERY_TOKEN"
scopes:
- ru_egrul
- icij_offshoreleaks
- gleif
resource_name: entities.ftm.json
namespace: true
datasets: []
Please note the use of namespace: true - this will add a dataset-specific suffix to the identifiers of all datasets loaded from the kyb catalog to make sure that their entity IDs do not overlap (and overwrite) the entities contained in the (de-duplicated default dataset).