An efficient heuristic algorithm for software module clustering optimization

Abstract

In the lifecycle of software projects, maintenance tasks usually entail 75% of the total costs, where most efforts are spent in understanding the program. To improve the maintainability of software projects, the code is often divided into components, which are then grouped in different modules following good design principles, lowering coupling and increasing cohesion. The Software Module Clustering Problem (SMCP) is an optimization problem that looks for maximizing the modularity of software projects in the context of Search-Based Software Engineering. In the SMCP, projects are often modeled as graphs. Therefore, the SMCP can be interpreted as a graph partitioning problem, which is proved to be NP-hard. In this work, we propose a new heuristic algorithm for software modularization, based on a Greedy Randomized Adaptive Search Procedure with Variable Neighborhood Descent. We present a three-fold categorization of neighborhoods for the SMCP and leverage domain-specific information to filter unpromising solutions. Our proposal has been successfully tested over a dataset of real software projects, outperforming the previous state-of-the-art approach in terms of Modularization Quality in very short computing times. Therefore, it could be integrated in software development tools to improve the quality of software projects in real time.

Publication
Journal of Systems and Software
Javier Yuste
Javier Yuste
Artificial Intelligence Phd Student
Abraham Duarte
Abraham Duarte
Full Professor

Abraham Duarte is Full Professor in the Computer Science Department at the Rey Juan Carlos University (Madrid, Spain). He has done extensive research in the interface between computer science, artificial intelligence, and operations research to develop solution methods based on Computational Intelligence (metaheuristics) for practical problems in operations-management areas such as logistics and supply chains, telecommunications, decision-making under uncertainty and optimization of simulated systems.