Names are inherently complex to analyze, match, and resolve in any language, but Thai names in particular are uniquely challenging to understand and match. Naming conventions in Thailand have a fascinating history and evolution that must be understood in order to build a name matching model that can parse Thai names properly.
If your data contains Thai names, there are a few must-haves to look for in Thai name matching systems.
Start simple: Learn your Thai ABCs
As with any non-Latin language, first developers need to understand the script. Like most Asian languages, Thai is not written in the Latin alphabet familiar to Western developers. Although any text can be transliterated to a new script, the most accurate results will always come from analyzing text in its native script, with a model trained on that script.
The Thai alphabet has 44 consonants and 15 vowel symbols, in addition to as many as 28 vowel forms, and 4 tone diacritics. Similar to Chinese, variations in tonal pronunciation can also completely change the meaning of a word. Written Thai requires language-specific text analytics models to process accurately.
What’s in a (nick)name?
Historically, Thai people believed that certain letters, numbers, and words were lucky or unlucky. Parents consulted with fortune tellers and astrologers to ensure they chose a lucky name for the time and date of their child’s birth. The result was the creation of long and complex first names that were considered spiritually advantageous, but in practicality, are too bulky to use in everyday life.
Many Thai people also believed that evil spirits could harm children. Using a child’s name was thought to draw the attention of these spirits and put children at risk. Instead of using a lengthy first name, parents give children a simple, one or two syllable nickname at birth. Thai nicknames served a dual purpose. Practically, simple names are easier for everyday use. Spiritually, nicknames confuse malicious spirits and protect children.
Thai nicknames rarely have any phonetic connection to the given first name. Instead they may be related to physical traits of the child, the name of an animal, derived from a foreign name, or just a completely made up syllable. For example, a Thai woman with the given name ธัญมาศ (Thanyamas) can have the nickname หนู (Nu) meaning “mouse.” She would use the name Nu throughout her life both socially and professionally, in all but the most formal legal documents.
Modern nicknames
While belief in superstition has waned, the custom of giving children a formal first name as well as a short nickname prevails, as does the preference to select nicknames that do not stem from first names.
Thai people may also gain additional nicknames as teens or adults, much as many Americans may shorten or change their name. For example, “William” goes by “Bill” or “Elizabeth” prefers “Bizzy.” This practice is particularly common among Thai people who interact with Westerners frequently, especially if their Thai nickname sounds inappropriate or weird to other cultures.
For example, the Thai film director, producer, and screenwriter Apichatpong Weerasethakul (อภิชาติพงศ์ วีระเศรษฐกุล) also has the Americanized nickname “Joe,” derived from his Thai nickname “Jei.” Similarly, a Thai boy with the nickname “Fuk” (the name for a species of Thai green pumpkin), may decide to select a new, more “appropriate sounding” nickname.
Cross-token alignment
Simplistic name matching systems break down names into token fields (ie. first, middle, last). Any data that contains name tokens in different order (ex. Sally Ride vs. Ride, Sally) would fail to match.
The ability to match tokens regardless of word order is particularly necessary for Thai names in which a nickname may be mistakenly labelled as the given name, and the given name labelled as a middle name.
For example, a former Prime Minister of Thailand is named ประยุทธ์ จันทร์โอชา (“Prayut Chan-o-cha” or “Prayuth Chan-ocha”), nicknamed ตู่ (“Tuu”). A name matching system that looks at each name field, or token, individually would incorrectly give the names “ตู่ ประยุทธ์ จันทร์โอชา” and “ประยุทธ์ จันทร์โอชา” a low match score.
Name one | Name two | Token match score | |
---|---|---|---|
First Name | ตู่ (Tuu) | ประยุทธ์ (Prayut) | 41.1% |
Middle Name | ประยุทธ์ (Prayut) | — | 0% |
Last Name | จันทร์โอชา (Chan-o-cha) | จันทร์โอชา (Chan-o-cha) | 100% |
Total match score: 47.03%
Instead, intelligent name matching looks at a name as a whole, and determines which tokens align with one another, regardless of word order.
Deletion penalty
Because Thai names data may or may not include nicknames – even on official documents like birth certificates – a database of Thai names is far more likely to contain names with variable numbers of tokens.
Similar to mismatched tokens, missing tokens hurt a name’s overall match score. A deletion penalty lowers the match score for two names that contain a different number of tokens, for example, the match score of “Will Smith” and “William Carroll Smith” will be penalized because the former is missing a middle name.
The ideal solution for matching Thai names would allow the user to adjust or eliminate the deletion penalty.
The creation of Thai surnames
While first names and nicknames have a long history, Thai surnames are a much more recent phenomena. In 1913, the Thai Nationality Act (also known as the Surname Act) was passed, requiring all permanent residents of Thailand to have surnames for the first time. Because surnames were uncommon before this law, many families simply made up a name. Understandably, they selected words with meanings that would reflect well on the family.
For example, the Thai king, Maha Vajiralongkorn or Rama X’s name means “adorned with jewels or thunderbolts.” Thai royalty can also bestow honorary surnames to families which are simply tacked onto the existing surname.
Ever-growing names
The Thai Nationality Act also required that each surname be unique. Families registered their chosen surname with the government, but had to alter it if the name they wanted was already in the registry. For example, if “Jaturapattara” is already registered, a family may choose something similar like “Jaturapattarapong” instead.
Having a registered Thai surname was also required of the large Chinese population that lived in Thailand in the early 20th-century. Initially, many chose to use their Chinese surname preceded by the work word “แซ่” (sae), Thai for “surname.” However the requirement that each family have a unique name meant that any Chinese family with a common name would have to add additional components, leading to increasingly lengthy names. Many of the longest Thai names seen today belong to people with Chinese-Thai ancestry.
No “Smith” or “Jones” in Thailand
Some overlap of Thai surnames does exist because technology was not advanced enough at the time the registry was created. Regardless, Thai names are still much more unique than names in most other languages. If two Thai people share a last name, they are very likely to be at least distantly related.
Today, new Thai citizens still must register a unique surname, but they can no longer be as long. In 1962, the Person Name Act was passed, limiting the length of new Thai names. To register a new name, it may not have more than ten Thai letters, excluding vowel symbols and diacritics. Although, any royally-conferred titles and surnames may exceed the ten character limit.
Weighting tokens
Some names, whether they are given or family names, are more rare than others. Unique names should accordingly be weighted more heavily when calculating a final name match score.
For example, two John’s in a database aren’t likely to be the same person because John is an extremely common name. Contrastingly, if you have two database entries named Dweezil, they are far more likely to refer to the same person.
The same pattern holds true for Thai names. Because almost all Thai surnames are unique to the family, they are much more rare than surnames in other languages. The ideal solution for matching Thai names assigns weights to tokens based on uniqueness, and offers the user the option to adjust the weighting model manually.
Surnames are more significant and unique in most languages, but this is particularly true for Thai names. Look for a solution that lets you weight surnames significantly more than other tokens.