Automatic cognate identification based on a fuzzy combination of string similarity measures

Soto Montalvo, Eduardo García Pardo, Raquel Martinez, Victor Fresno

enero, 2012

Resumen

Cognates are words in different languages that have similar spelling and meaning. The identification of cognates is very useful for many different Natural Language Processing tasks, and also in the process of learning a second language. This paper presents a new approach to classify pairs of words into cognates/false friends or not related classes. The proposed approach uses a fuzzy system to combine complementary string similarity measures in order to improve the cognate identification task. The underlying hypothesis is that the combination of different string measures by applying heuristic knowledge, can outperform those measures working separately. The results obtained by the proposed system confirm the previous hypothesis, and furthermore it also outperforms other systems that combine string measures by using a supervised approach. As an additional contribution, we have created a bilingual test data set which include pairs of cognates, false friends and unrelated words in Spanish and English, that is freely available for research purposes.

Tipo

Artículo de conferencia

Publicación

2012 IEEE International Conference on Fuzzy Systems

Automatic cognate identification based on a fuzzy combination of string similarity measures

Resumen

Eduardo García Pardo

Catedrático de Universidad