Common issues experienced while deploying and operating yente.
yente: Intro · Deployment · Settings · Custom datasets · FAQ
The self-hosted API is - by default - configured to check for new OpenSanctions data releases every 30 minutes (cf. YENTE_SCHEDULE
). New data releases are published several times a day.
If new data is available, an indexing process will be started, and requests will switch to the new data once it is fully searchable (ca. 15 minutes).
Here is a very quick tour of how yente
works:
data.opensanctions.org
which states the latest version of the OpenSanctions data that was been released.yente-entities-all-00220221030xxxx
).data.opensanctions.org
(a 500MB+ JSON file) and store it onto the /tmp
volume of the container.yente
will create an ES index alias from yente-entities-all
to the latest snapshot of the index (e.g. yente-entities-all-00220221030xxxx
) and delete all older snapshots of the index./search
and /match
APIs work correctly. On the plus side, any future updates to the data will be indexed first, and the switch-over to the new data will be instantaneous.index_not_found_exception
, what's wrong?This probably means that the initial index-building (described above) never completed. Check the following:
data.opensanctions.org
.yente-entities-all
was created. If a timestamped index was created, but the final alias does not exist, it likely means that indexing was aborted half-way. This could be because a) the downloaded data could not be fetched or stored in its entirety, b) the indexing of entities was aborted, perhaps due to a lack of system memory or compute time.While debugging this issue, you can use http://yente-service:8000/updatez?token=UPDATE_TOKEN&force=true
to trigger a forced re-index of the data at any time. The UPDATE_TOKEN
is a secret token you can define in the environment of the yente
pod using the YENTE_UPDATE_TOKEN
variable.
In some cases, you may want to customise the set of datasets which a query is to search, e.g. to select only a subset of relevant datasets. You can use the custom datasets to do this by adding a manifest file like this:
catalogs:
- url: "https://data.opensanctions.org/datasets/latest/index.json"
# Limit the dataset scope of the entities which will be indexed into yente. Useful
# values include `default`, `sanctions` or `peps`. This will speed up the update
# process in which data is re-indexed.
scope: sanctions
resource_name: entities.ftm.json
datasets:
- name: europe
title: European datasets
datasets:
- eu_fsf
- eu_travel_bans
- eu_sanctions_map
- be_fod_sanctions
- fr_tresor_gels_avoir
# - gb_hmt_sanctions
This will create a new dataset collection named europe
, which can be used in query endpoints, e.g. /match/europe
and /search/europe
. Please note that the other datasets included in the sanctions
collection will still be stored in the index and attributes originating from those sources will be included in the person and company profiles. If your use cases requires building a completely custom dataset, please contact us.
yente makes you a match.
OpenSanctions is free for non-commercial users. Businesses must acquire a data license to use the dataset.