Credit: MohitSingh, [CC BY-SA 3.0] via Wikimedia Commons
In June 2025, the Office of Foreign Assets Control (OFAC) issued a landmark $215,988,868 penalty on GVA Capital Ltd., a venture capital firm based in San Francisco, California. The historic penalty was imposed after the firm violated U.S. sanctions on Russia through its involvement with Russian billionaire Suleiman Kerimov, according to the enforcement release. The firm also failed to respond to an OFAC subpoena.
Cases like this highlight OFAC’s increasing focus on “gatekeepers” — the civil monetary penalty issued is the statutory maximum — but they also underscore the need for screening checks that go beyond sanctions lists.
While sanctions lists are essential for legal compliance, enforcement actions outline the penalties for violating sanctions, such as monetary fines or legal settlements. These often involve individuals or companies doing business with sanctioned entities, who do not appear on sanctions lists themselves.
We’ve recently expanded our database to include enforcement actions and regulatory notices. These actions typically involve civil monetary penalties, settlement agreements, or warning letters for violations such as conducting business with sanctioned parties, failing to freeze designated assets, or having inadequate compliance programs. The notices released online serve both as punishment for violations and as deterrence for others, with agencies publicly disclosing significant cases to provide guidance on sanctions compliance.
US federal agencies make up a significant portion of this collection, which includes formal legal proceedings from OFAC, as well as penalties issued by other regulatory bodies such as the U.S. Securities and Exchange Commission (SEC), Commodity Futures Trading Commission (CFTC), and the Federal Reserve Bank.
Structured data from free text
Enforcement actions are often published in press releases or online notices, either in HTML or PDF format — and, while we're no strangers to the creative ways people come up with to publish a list of names, reliably extracting entities named in free text has proven challenging until now.
Here at OpenSanctions, data quality is central to what we do. When we publish a data set, we want to be confident that:
- It is complete, at least for a well-defined time window
- It is up to date, at least for a clear, published update frequency
- It is accurate and true to the source
When entities (such as people or companies) are listed in free text with a narrative description of the offences they’re accused of, we need to turn that unstructured text into structured data — and formatting inconsistencies can often make that process more complex.
For example, the CFTC Enforcement Actions almost always publish a defendant's name in “bold” <b> or <strong> — that is, except for the few cases where they don't, and if we miss those untagged names, we end up with an incomplete dataset.
Sometimes headings are also in bold, which means we need to weed them out to avoid creating entities named after headings that are coincidentally formatted like names, for example, "Cases Background” or "Parallel Criminal Action”.
The Natural Language Processing (NLP) field offers a range of approaches to extract named entities from free text. But each dataset brings its own twists and turns, and as a result, trade-offs in accuracy. For instance, datasets can include:
- Any number of aliases
- Any number of addresses
- Other optional identifiers like IMO numbers or company numbers
- A variety of languages and alphabets, such as Arabic and Cyrillic
The advent of Large Language Models (LLMs) and the ongoing development of AI have been transformative. We can now write prompts that help us extract clean data from free-text sources and quickly transform it into structured data in the Follow The Money data format we use to organise and list entities consistently. We can even target entities that are subject to enforcement actions, excluding people mentioned in the releases who are not subject to enforcement — so-called bystanders, such as the regulators themselves.
While the capabilities of LLMs are improving at an impressive pace, to uphold our quality standards, we decided that — as a matter of policy — all data extracted by an LLM will be reviewed by a human moderator before publication. Moderators can make corrections if needed, and only when the result is accepted does it become officially part of the dataset.
A more complete picture — thanks to rich data
Extracting rich, structured entity data from press releases and other free text sources provides OpenSanctions users with a more complete picture of an entity or individual. We can often extract a range of attributes that support identification and deduplication for screening.
On top of that, the profile of each entity mentioned in an enforcement proceeding can now link directly to the press release(s) they’ve been mentioned in, providing narrative details on specific actions and expanding on the nature of the offences.
Enforcement action releases where Sam Bankman-Fried is named
It’s also worth noting that groups of related entities can often be found in the same press release, providing a deeper insight into an individual’s network and connections:
People named in one enforcement action release along with Sam Bankman-Fried
Adding narrative to OFAC sanctions
Similar to enforcement actions, OFAC press releases provide interesting background and context, particularly when a group of entities is sanctioned.
Sanctions designations are usually set against a backdrop of broader geopolitical events, and these press releases can shed light on how — and where — an entity fits in. This context can provide companies with relevant information when screening customers and suppliers, deepening due diligence checks and strengthening risk assessments, but can also provide journalists with useful context or further leads.
The Enforcements collection
The ability to extract this data at the level of automation and quality we expect has prompted us to create the Enforcements collection, which currently comprises seven datasets and contains over 20,000 entities. Browse those data sets here.
