How we score matches JSON

The OpenSanctions API supports matching of entities using a simple query-by-example mechanism. For transparency, you can find the weighting of features used in that API here.

The API uses a simple entity comparison model based on logistic regression. Both the training data and the code are fully public, inviting public scrutiny and proposals for improvement.

Matching features

birth_place-0.107Same place of birth.
country_mismatch-0.263Both entities are linked to different countries.
email_match0.114Matching email addresses between the two entities.
family_name_match0.053Matching family name between the two entities.
first_name_match0.119Matching first/given name between the two entities.
gender_mismatch-0.271Both entities have a different gender associated with them.
identifier_match0.353Matching identifiers (e.g. passports, national ID cards, registration or tax numbers) between the two entities.
key_day_matches1.229The birth date or incorporation date of the two entities is the same.
key_year_matches0.278The birth date or incorporation year of the two entities is the same.
name_levenshtein0.698Consider the edit distance (as a fraction of name length) between the two most similar names linked to both entities.
name_match0.728Check for exact name matches between the two entities.
name_numbers-0.197Find if names contain numbers, score if the numbers are different.
name_token_overlap0.208Evaluate the proportion of identical words in each name.
phone_match0.062Matching phone numbers between the two entities.