Automatic cognate identification based on a fuzzy combination of string similarity measures

Resumen

Cognates are words in different languages that have similar spelling and meaning. The identification of cognates is very useful for many different Natural Language Processing tasks, and also in the process of learning a second language. This paper presents a new approach to classify pairs of words into cognates/false friends or not related classes. The proposed approach uses a fuzzy system to combine complementary string similarity measures in order to improve the cognate identification task. The underlying hypothesis is that the combination of different string measures by applying heuristic knowledge, can outperform those measures working separately. The results obtained by the proposed system confirm the previous hypothesis, and furthermore it also outperforms other systems that combine string measures by using a supervised approach. As an additional contribution, we have created a bilingual test data set which include pairs of cognates, false friends and unrelated words in Spanish and English, that is freely available for research purposes.

Publicación
2012 IEEE International Conference on Fuzzy Systems
Eduardo García Pardo
Eduardo García Pardo
Profesor Titular de Universidad

Miembro fundador del grupo de investigación GRAFO, cuya línea de investigación principal es el desarrollo de algoritmos para abordar problemas de optimización, temática sobre la que versa la Tesis Doctoral del investigador y en la que se enmarcan sus publicaciones más destacadas.