Machine Learning engineers attempt to answer the eternal question: are there words or concepts that are “universally translatable” across the world’s different cultures and languages?

Home / Uncategorized / Machine Learning engineers attempt to answer the eternal question: are there words or concepts that are “universally translatable” across the world’s different cultures and languages?

August 21, 2020

Uncategorized

Concepts integral to the human condition exist independent of language, and vocabularies are used to name those concepts

BY:

Alexis de Hahn
Avocat Reporteur
PROJECT COUNSEL MEDIA

21 August 2020 (Athens, Greece) – To paraphrase Shakespeare, does a rose smell just as sweet if it is called rosa, ruža, or τριαντάφυλλο?

In a paper published on August 10, 2020 in the journal Nature Human Behavior, the authors — Princeton Computer Scientist Bill Thompson, Seán G. Roberts of Cardiff University, and Gary Lupyan of the University of Wisconsin-Madison — used an algorithm to determine whether translation equivalents really mean the same thing in each language.

The team was led by Bill Thompson, who heads the Department of Computer Science at Princeton University, Princeton, NJ. I profiled him last year with respect to his series of analyses that show culture dramatically shapes the evolution of the human mind, giving us innate predispositions that only weakly constrains our behavior.

The answer, it turns out, depends on how similar speakers’ cultures are — in other words, it is easier to translate between two languages spoken by groups with similar cultures than between languages whose speakers have very different cultures. “Our results do not fully fit into either the universalist or relative perspectives,” the researchers wrote, referencing two opposing schools of thought in linguistics.

From a universalist viewpoint, concepts integral to the human condition exist independent of language, and vocabularies are used to name those concepts. By contrast, a relative perspective states that language vocabularies are influenced by culture, and speakers come to understand concepts, categories, and types while learning the language.

Although research, such as the 2007 study on “the Russian blues,” has suggested that language may affect perception, a final verdict in the universalist-relative debate has been hard to come by without a consistent way of quantifying similarities between languages.

Past studies have also typically been limited to the comparison of two languages at a time. The authors view their research “as an early attempt to quantify semantic alignment at scale using distributional semantics.”

Context counts

To compute semantic alignment (that is, the relationships between words with similar meanings), researchers looked for the range of contexts in which a given word was used and the frequency with which it was used.

Their main analyses applied the fastText skipgram algorithm to language-specific versions of Wikipedia, and analyses were replicated using embeddings derived from OpenSubtitles2018 database and from a combination of Wikipedia and the Common Crawl dataset.

For each word (such as “beautiful” in English), the algorithm identified semantic neighbors, words that often appear nearby (e.g., “colorful,” “love,” and “sparkle”). It then translated those semantic neighbors into a target language (for example, French) and calculated their semantic similarity to the French equivalent, “beau.”

Next, the algorithm identified semantic neighbors of “beau” in French and translated them into English. The final similarity score for a word’s meaning quantifies how closely the semantics aligned in both directions of the translation.

This process was repeated for word forms for 1,010 concepts in 41 languages across 10 language families. Drawn from the NorthEuraLex (NEL) dataset, which is compiled from dictionaries and other linguistic resources that are available for individual languages in Northern Eurasia, those words spanned 21 semantic domains, including both concrete and abstract concepts.

Humans were tasked with validating the computed semantic alignment, and researchers found a strong correlation with the similarity judgments made by native speakers and the algorithm in Dutch–English translation pairs, as well as a set of Japanese–English translatability ratings for 192 word pairs.

Most notably, the team used the semantic alignment measure to predict how consistently speakers of six languages would use the same term to name 750 images. Meanings with lower semantic alignment between languages were associated with less consistent name agreement across the six languages.

This exercise also confirmed the researchers’ prediction that larger differences in name agreement corresponded to lower overall alignment. For example, when shown a picture of a clothes hanger, 100% of Spanish-speakers called it “percha”; 77% of English-speakers called it a hanger; and 33% of Italian-speaking subjects called it “appendino.” Accordingly, Spanish and Italian might have a lower alignment than would Spanish and English.

Universally translatable

To be fair, the study found that there are some “universally translatable” words, though not words associated with natural or very concrete meanings as expected. Instead, domains with fewer dimensions by which to organize terms were most alignable; namely, number words, temporal terms, and common kinship terms:

“Although kinship systems vary, terms denoting close kin relations are organized along a few dimensions, such as gender (son/daughter, mother/father) and generation (grandmother/mother/daughter). This low dimensionality seems to enable high alignment”.

This explanation alludes to, perhaps, the most compelling part of the study, where researchers applied another algorithm that quantified the overall similarity of two cultures that produced different languages.

The algorithm analyzed the proportion of cultural traits in common, based on an anthropological dataset of 92 non-linguistic cultural traits of 39 societies, and compared features such as marriage practices, legal systems, and political organization of speakers.

Within semantic domains, the cultural correlation was strongest for words related to food and drink, time, animals, and the body. Cultural similarity also corresponded to higher alignment for specific cultural domains. For example, two cultures with a similar “subsistence type” show higher semantic alignment in related cultural domains, such as “food and drink,” “animals,” and “agriculture and vegetation.”

For 19 Indo-European languages for which detailed information on historical and geographical proximity was available, researchers also found that historical proximity correlated more closely with semantic alignment than did geographical proximity. Ultimately, they concluded that “semantic alignment between languages is better predicted by cultural similarity than by the geographical proximity of the populations who speak them.”

1. If the structure of language vocabularies mirrors the structure of natural divisions that are universally perceived, then the meanings of words in different languages should closely align.

2. By contrast, if shared word meanings are a product of shared culture, history and geography, they may differ between languages in substantial but predictable ways.

3. Here, the team analysed the semantic neighborhoods of 1,010 meanings in 41 languages. The most-aligned words were from semantic domains with high internal structure (number, quantity and kinship). Words denoting natural kinds, common actions and artefacts aligned much less well.

4. Languages that are more geographically proximate, more historically related and/or spoken by more-similar cultures had more aligned word meanings. These results provide evidence that the meanings of common words vary in ways that reflect the culture, history and geography of their users.

admin