Automatic cognate identification based on a fuzzy combination of string similarity measures

Abstract

Cognates are words in different languages that have similar spelling and meaning. The identification of cognates is very useful for many different Natural Language Processing tasks, and also in the process of learning a second language. This paper presents a new approach to classify pairs of words into cognates/false friends or not related classes. The proposed approach uses a fuzzy system to combine complementary string similarity measures in order to improve the cognate identification task. The underlying hypothesis is that the combination of different string measures by applying heuristic knowledge, can outperform those measures working separately. The results obtained by the proposed system confirm the previous hypothesis, and furthermore it also outperforms other systems that combine string measures by using a supervised approach. As an additional contribution, we have created a bilingual test data set which include pairs of cognates, false friends and unrelated words in Spanish and English, that is freely available for research purposes.

Publication
2012 IEEE International Conference on Fuzzy Systems
Eduardo García Pardo
Eduardo García Pardo
Full Professor

One of the founders of the investigation group GRAFO, whose main line of research is the development of algorithms to tackle optimization problems, the topic of the researcher’s Doctoral Thesis and which their most notable publications are framed.