Frequently asked questions

#24: How are entity captions used?

Category: Data structure · Last updated: · Permalink

Data related to a single entity is often aggregated from multiple sources. Those may describe the same person or company using multiple names, such as Emmanuel Macron, MACRON, Emmanuel or Эмманюэль Макрон. While the full list of available names is provided as part of the entity data, you can also use the caption field, which is an algorithmically-picked, preferred display name for the entity. The following criteria are applied when picking a display name:

  • Names written in the Latin alphabet are preferred over names in other scripts.
  • If multiple sources identify the same name, that name is given preference.
  • A title-cased name (Emmanuel Macron) is picked over upper-cased names (Emmanuel MACRON)

Failing all these strategies, the name with the lowest combined edit distance to all other names is chosen.

Related questions

« Back to full FAQ index