A screening data provider usually offers you one of two choices: an API to call, or a bulk data feed to ingest. OpenSanctions gives you both. The hosted API is metered per query. The bulk data license puts the full dataset in your own infrastructure, where you can load it into any screening or backend system.
You don't have to build the screening to use that data. yente, the open source engine behind our hosted API, runs on the same bulk data to give you the same service in-house: you get the same matching algorithms and scoring, but your query data never leaves your building. You license the data; running yente is free.
Because both options run the same engine, switching between them is simply a matter of choice. A client that screens against api.opensanctions.org will work against your own instance with no change beyond the base URL. You can start hosted and move on-premise later, or run both at once: hosted for development, on-premise for production.
The choice, then, comes down to a few considerations.
Some requirements settle the question outright
There are a few requirements that should be considered before cost.
The most obvious is data residency. When you call the hosted API, the data you're screening — names, dates of birth, identifiers — is sent to our cloud services in Frankfurt for matching. That’s fine for many of our customers. To some, it's a non-starter: if customer Personally Identifiable Information (PII) cannot leave your infrastructure (e.g. for regulatory reasons), running yente yourself keeps the query data in your environment, with no data-processor relationship to declare under the GDPR. Keeping query data in-house is one of the strongest reasons to run yente yourself.
However, it’s worth checking that this requirement is real. In our experience, a regulator or supervisor rarely forbids sending screening queries to a data processor; much more often, the constraint is an internal inference, made out of caution, that hardens into a rule nobody actually wrote down. A genuine requirement would have a source you can point to: a clause in a contract with your own customers, a documented data-protection decision, or a sector rule. If that’s the case, the on-prem integration is the right choice regardless of cost.
Another yente capability forces on-premise use too: adding your own in-house watchlists to screen alongside our data. Only an instance you run can combine your private lists with ours, and, like data residency, that decision is based on capability rather than price.
The deciding cost is operating it, not licensing it
For everyone else, the decision looks like renting versus buying.
The hosted API is rent: you pay for what you use, month to month, with nothing to maintain. The bulk data license is the purchase — a fixed price, no per-query charge — and at high enough volume, buying looks obviously cheaper than renting forever.
That instinct is right, but the license equals only the price of the house, not the cost of living in it. Operating the service — the search cluster, the monitoring, the upgrades — is the upkeep: a separate, ongoing cost that never arrives as an invoice, which makes it easy to overlook.
So the first question isn't when self-hosting gets cheaper. It's whether your organization can operate it at all. yente is production infrastructure: it needs a search backend (Elasticsearch or OpenSearch), enough memory and fast storage, and somewhere modern to run it — Docker Compose to start, Kubernetes to scale. Operating it well means three things:
- Standing it up properly. Sizing the search backend (an under-provisioned heap is the most common reason a deployment won't index), choosing your datasets, wiring up health checks and logging, and tuning matching thresholds to your risk appetite. Getting from "it runs" to production-ready is days to weeks of engineering, not an afternoon.
- Noticing when something breaks. The data updates itself: we publish several times a day, and yente reindexes in the background. What is not automatic is that somebody is watching your instance. If a reindex fails, your team needs to catch that through its monitoring setup.
- Staying current. A new version of yente ships roughly monthly. We recommend upgrading about every six months and not falling more than a year behind. Even at that cadence, this is recurring work: read the changelog, test the upgrade (major releases can break compatibility), roll it out without downtime.
None of this is extraordinary in software operations, but it assumes a modern deployment context and a DevOps function that owns it. The failure we most want to help you avoid is the quiet one – an under-maintained instance screening against weeks-old data while the team believes it's covered – as that results in a screening failure.
Note the boundary, too — our support covers yente and the data, not running your search cluster. If managed search isn't something your team already does, count that in.
Where self-hosting starts to pay off
If you have that capacity, the next question is volume. The hosted API's appeal is the appeal of renting: what you pay scales with what you use, with no fixed outlay and nothing to maintain. At low and moderate volumes, that's a genuinely good deal – as a rough anchor, below a few tens of thousands of screenings a month, it's almost always the cheapest option.
Buying trades pay-as-you-go rent for a fixed license with no per-query charge. The point where it pays off — where the license plus the upkeep beats paying per query — comes later than a rent-versus-buy sum suggests, because the honest comparison is the purchase price plus the cost of ownership, not the purchase price alone. Price that upkeep against your own engineering rates rather than zero, and renting remains a better deal for longer than the headline numbers imply.
What tips the balance toward buying is also the pattern of your usage. Steady, moderate transactional screening is what renting is built for. Workloads that re-screen a large portfolio daily or run continuous monitoring generate high, sustained query volumes, and at that scale, a fixed license is a better fit. Spiky, batch-heavy, monitoring-driven workloads reach the point where buying pays off much sooner than steady ones.
A sensible default: Start hosted
For a new integration with no requirement that settles the question up front, the safest path is usually to start on the hosted API and move to self-hosting when usage, customization, or compliance clearly justifies the operational commitment. Because both run on the same software, moving on-prem comes with few costs: spin up a yente instance, change the base URL, and change the credential. Done.
If you're weighing it up, the on-premise documentation covers what running yente involves, and the licensing page covers the bulk data side. Or get in touch, and we'll help you model it against your own volume and decide which integration best suits your needs.
