Common issues experienced while deploying and operating yente.
The self-hosted API is - by default - configured to check for new OpenSanctions data releases every 30 minutes (cf.
YENTE_SCHEDULE). New data releases are published several times a day.
If new data is available, an indexing process will be started, and requests will switch to the new data once it is fully searchable (ca. 15 minutes).
Here is a very quick tour of how
data.opensanctions.org which states the latest version of the OpenSanctions data that was been released.
data.opensanctions.org (a 500MB+ JSON file) and store it onto the
/tmp volume of the container.
yente will create an ES index alias from
yente-entities-all to the latest snapshot of the index (e.g.
yente-entities-all-00220221030xxxx) and delete all older snapshots of the index.
/match APIs work correctly. On the plus side, any future updates to the data will be indexed first, and the switch-over to the new data will be instantaneous.
index_not_found_exception, what's wrong?
This probably means that the initial index-building (described above) never completed. Check the following:
yente-entities-all was created. If a timestamped index was created, but the final alias does not exist, it likely means that indexing was aborted half-way. This could be because a) the downloaded data could not be fetched or stored in its entirety, b) the indexing of entities was aborted, perhaps due to a lack of system memory or compute time.
While debugging this issue, you can use
http://yente-service:8000/updatez?token=UPDATE_TOKEN&force=true to trigger a forced re-index of the data at any time. The
UPDATE_TOKEN is a secret token you can define in the environment of the
yente pod using the
In some cases, you may want to customise the set of datasets which a query is to search, e.g. to select only a subset of relevant datasets. You can use the custom datasets to do this by adding a manifest file like this:
- url: "https://data.opensanctions.org/datasets/latest/index.json"
# Limit the dataset scope of the entities which will be indexed into yente. Useful
# values include `default`, `sanctions` or `peps`. This will speed up the update
# process in which data is re-indexed.
- name: europe
title: European datasets
# - gb_hmt_sanctions
This will create a new dataset collection named
europe, which can be used in query endpoints, e.g.
/search/europe. Please note that the other datasets included in the
sanctions collection will still be stored in the index and attributes originating from those sources will be included in the person and company profiles. If your use cases requires building a completely custom dataset, please contact us.
yente makes you a match.