Strategic oscillation for the balanced minimum sum-of-squares clustering problem

Abstract

In the age of connectivity, every person is constantly producing large amounts of data every minute: social networks, information about trips, work connections, etc. These data will only become useful information if we are able to analyze and extract the most relevant features from it, which depends on the field of analysis. This task is usually performed by clustering data into similar groups with the aim of finding similarities and differences among them. However, the vast amount of data available makes traditional analysis obsolete for real-life datasets. This paper addresses the problem of dividing a set of elements into a predefined number of equally-sized clusters. In order to do so, we propose a Strategic Oscillation approach combined with a Greedy Randomized Adaptive Search Procedure. The computational experiments section firstly tunes the parameters of the algorithm and studies the influence of the proposed strategies. Then, the best variant is compared with the current state-of-the-art method over the same set of instances. The obtained results show the superiority of the proposal using two different clustering metrics: MSE (Mean Square Error) and Davies-Bouldin index.

Publication
Information Sciences
Jesús Sánchez-Oro
Jesús Sánchez-Oro
Associate Professor

Associate Professor at the Computer Science Department, being one of the senior researchers of the Group for Research on Algorithms For Optimization GRAFO.

Abraham Duarte
Abraham Duarte
Full Professor

Abraham Duarte is Full Professor in the Computer Science Department at the Rey Juan Carlos University (Madrid, Spain). He has done extensive research in the interface between computer science, artificial intelligence, and operations research to develop solution methods based on Computational Intelligence (metaheuristics) for practical problems in operations-management areas such as logistics and supply chains, telecommunications, decision-making under uncertainty and optimization of simulated systems.