The OpenSanctions API supports matching of entities using a simple query-by-example mechanism. For transparency, you can find the weighting of features used in that API here.
The API uses a simple entity comparison model based on logistic regression. Both the training data and the code are fully public, inviting public scrutiny and proposals for improvement.
Feature | Coefficient | Description |
---|---|---|
birth_place | -0.107 | Same place of birth. |
country_mismatch | -0.263 | Both entities are linked to different countries. |
email_match | 0.114 | Matching email addresses between the two entities. |
family_name_match | 0.053 | Matching family name between the two entities. |
first_name_match | 0.119 | Matching first/given name between the two entities. |
gender_mismatch | -0.271 | Both entities have a different gender associated with them. |
identifier_match | 0.353 | Matching identifiers (e.g. passports, national ID cards, registration or tax numbers) between the two entities. |
key_day_matches | 1.229 | The birth date or incorporation date of the two entities is the same. |
key_year_matches | 0.278 | The birth date or incorporation year of the two entities is the same. |
name_levenshtein | 0.698 | Consider the edit distance (as a fraction of name length) between the two most similar names linked to both entities. |
name_match | 0.728 | Check for exact name matches between the two entities. |
name_numbers | -0.197 | Find if names contain numbers, score if the numbers are different. |
name_token_overlap | 0.208 | Evaluate the proportion of identical words in each name. |
phone_match | 0.062 | Matching phone numbers between the two entities. |