yente is an open source data match-making API. It provides functions search, retrieve or match FollowTheMoney entities, including people, companies or vessels that are subject to international sanctions.
yente: Intro · Deployment · Settings · Custom datasets · FAQ
Running yente
requires a server that can run host the main screening application (a lightweight Python application) and the ElasticSearch backend used to store and query entity information. In total, we anticipate 500 MB memory per Python service, and 2-4GB of memory plus 8-10GB of disk volume size for the ElasticSearch index. Running ElasticSearch on SSD-backed hard drives will produce a significant performance gain.
While it is possible to operate yente
outside of Docker, we strongly encourage the use of containers as a simple means of dependency management and deployment. We provide pre-built containers of the latest released version of Yente at ghcr.io/opensanctions/yente:latest
.
For the docker-compose
container orchestration tool, we provide an example docker-compose.yml
in the repository. You can use it to easily get started with Yente and later modify it to your individual needs.
mkdir -p yente && cd yente
wget https://raw.githubusercontent.com/opensanctions/yente/main/docker-compose.yml
docker-compose up
This will make the service available on Port 8000 of the local machine. You may have to wait for five to ten minutes until the service has finished indexing the data when it is first started.
Next: Configure yente
If you run the container in a cluster management system like Kubernetes, you will need to run both of the services defined in the compose file (the API and ElasticSearch instance). We provide an example Kubernetes configuration in the repository. You may also need to assign the API container network policy permissions to fetch data from data.opensanctions.org
once every hour so that it can update itself.
Note that in this configuration, the yente workers run with YENTE_AUTO_REINDEX
disabled. Reindexing is performed by a reindex job that is launched periodically by the cluster management system.
Yente tries to be gentle on resources — a single process on a reasonably modern CPU core can go a surprisingly long way. When scaling out, we recommend using Kubernetes or another managed cloud service (e.g. Google Cloud Run). In this model, scaling is achieved by launching more containers, each with a single worker process (the default) and access to one vCPU.
OpenSanctions is free for non-commercial users. Businesses must acquire a data license to use the dataset.