Skip to main content
Map of central Asia

Challenges of Matching in Uzbek

By Guest writer: Artem Sentsov, CTO of ClearPic

As ClearPic set out to compile a due diligence database for Central Asia, a significant challenge was matching records from diverse sources about the same people or companies (entities), a problem called entity resolution. Given the diversity of languages spoken in Central Asia, including Uzbek, Mongol, Tajik and others, it is important that entity resolution technologies handle text data in multiple languages and character sets.[1] This requires specialized knowledge and expertise in each language’s linguistic and cultural nuances, as well as access to data sources and other resources that can support the entity resolution process. This blog focuses on the personal names of Uzbekistan, which as in any country, have been heavily influenced by its history and culture.

Present-day Uzbekistan has attracted a wide range of peoples over its long history, as it is advantageously located at an oasis in a desert region of Central Asia between two rivers Syr Darya and Amu Darya. Unsurprisingly, it was a stop on the Great Silk Road. Most recently, following the invasion of the Russian Empire in the 1860s, Uzbekistan became part of the Soviet Republic in 1924, finally gaining independence in 1991.

This transition from Cyrillic Soviet-like names to Latin ones has caused a number of discrepancies in personal name spellings. For example, the letter “х” is often represented by “kh” and the letter “ж” could be represented by “j” or “dj”. Thus, the surname Хўжаев has been transliterated variably, from Soviet-era forms like “Ходжаев” to modern Latin versions such as “Hodjaev” or “Xo’jayev.” Even when already in Latin script, further anglicization occurs, leading to variations like “Khodjaev.” Using another surname, “Якубов” or “Ёқубов,” would be “Yakubov” or “Yoqubov" in Latin script, respectively.

Uzbek personal names typically consist of two parts: the given name and the family name. The given name is usually chosen based on cultural or religious traditions and may be a Muslim name. The family name is usually derived from the father’s name, with the suffix “ov” or “eva” added to it for men and women, respectively. Uzbek personal names may also be influenced by neighboring countries and cultures, such as Russia, Kazakhstan, and Tajikistan.

Patronymic names in Uzbekistan are derived from the father’s first name. They are constructed by adding the suffix “o'g'li” (son of) or “qizi” (daughter of) to the father's first name. For example, if the father’s name is Olim, his son's patronymic name would be Olimjon o'g'li, which means son of Olim. Similarly, if the father's name is Dilorom, his daughter's patronymic name would be Dilorom qizi, which means daughter of Dilorom.

Patronymic names are used as a form of address and identification in Uzbek culture, especially in formal settings such as in legal documents or in addressing government officials.

In Uzbekistan, it is also common to use the Soviet form of patronymic names, which adds the suffix “ovich” or “ovna” to the father’s first name instead of o'g'li or qizi. For example, using the same father’s name Olim, his son's patronymic name could also be Olimjonovich, which means son of Olim, in the Soviet-style format. Similarly, his daughter's patronymic name could be Olimovna. All these variations make an entity resolution process extremely challenging.

Examples

Full Name: (with Uzbek patronymic for a son) Shohjahon Qodirov Olimjon o'g'li
In Cyrillic script: Шоҳжаҳон Қодиров Олимжон ўғли/ Кодиров Шохжахон Олимжонович

Full Name: (with Uzbek patronymic for a daughter) Abdullayeva Munis Tuxtasin Qizi
In Cyrillic script: Абдуллаева Мунис Тухтасин кызы

Full Name: (with Soviet-like patronymic for a son) Akramov Sherzod Salimovich
In Cyrillic script: Акрамов Шерзод Салимович

Full Name: (with Soviet-like patronymic for a daughter) Asfandiyarova Tamila Kamilevna
In Cyrillic script: Асфандиярова Тамила Камильевна

The net result is that a person could legitimately have their surname spelled very differently on different official documents, which is a huge challenge to connecting personal data from different databases.

Learn more in this case study about how leveraging the Cyrillic name matching algorithms in Babel Street’s Match Identity helped ClearPic create its master due diligence database. Today Match Identity is the matching engine that enables English speakers to screen entities against this master database of Central Asian entities written in Uzbek and other Cyrillic-based languages.

Guest blog author Artem Sentsov is CTO of ClearPic — a risk-management platform that is enabling organizations worldwide to comply with anti-money laundering (AML) and sanctions screening regulations. He spent the last few years leading a team to create a master due diligence business database from public records written in Uzbek, Tajik, and Mongolian names to cover businesspeople and organizations in the Central/Western Asia and the Caspian regions.

Endnotes:

1. A character set is the set of characters (including letters, punctuation, symbols, and spaces) that are used by a language. English uses the ASCII character set, for example.

Disclaimer

All names, companies, and incidents portrayed in this document are fictitious. No identification with actual persons (living or deceased), places, companies, and products are intended or should be inferred.

Babel Street Home