How we score matches JSON

The OpenSanctions API supports matching of entities using a simple query-by-example mechanism. For transparency, you can find the weighting of features used in that API here.

The API uses a simple entity comparison model based on logistic regression. Both the training data and the code are fully public, inviting public scrutiny and proposals for improvement.

Matching features

FeatureCoefficientDescription
name_match1.234Check for exact name matches between the two entities.
name_token_overlap0.075Evaluate the proportion of identical words in each name.
name_numbers-0.221Find if names contain numbers, score if the numbers are different.
name_levenshtein0.434Consider the edit distance (as a fraction of name length) between the two most similar names linked to both entities.
phone_match0.040Matching phone numbers between the two entities.
email_match0.034Matching email addresses between the two entities.
identifier_match0.214Matching identifiers (e.g. passports, national ID cards, registration or tax numbers) between the two entities.
dob_matches1.190The birth date or incorporation date of the two entities is the same.
dob_year_matches0.224The birth date or incorporation year of the two entities is the same.
first_name_match0.068Matching first/given name between the two entities.
family_name_match0.105Matching family name between the two entities.
birth_place-0.106Same place of birth.
gender_mismatch-0.261Both entities have a different gender associated with them.
country_mismatch-0.323Both entities are linked to different countries.
org_identifier_match0.547Matching identifiers (e.g. registration or tax numbers) between two organizations or companies.
address_match0.902Text similarity between addresses.
address_numbers0.218Find if names contain numbers, score if the numbers are different.