Maximally Diverse Grouping Problem

Gallego, Laguna, Martí and Duarte (2011)
Optsicom project, University of Valencia (Spain)

Problem Description

The maximally diverse grouping problem (MDGP) consists of grouping a set of M elements into G mutually disjoint groups in such a way that the diversity among the elements in each group is maximized. The diversity among the elements in a group is calculated as the sum of the individual distance between each pair of elements, where the notion of distance depends on the specific application context. The objective of the problem is to maximize the overall diversity, i.e., the sum of the diversity of all groups. Feo and Khellaf (1990) proved that the MDGP is NP-hard.

In order to formulate the MDGP in mathematical terms, we assume that each element can be represented by a set of attributes. Let s_ik be the state or value of the k^th attribute element, where k= 1,...,K and i=1,...,M. Then, the distance d_ij between element i and j may be simply defined by the Eucliden calculation:

We have identified two variants of the MDGP. The first one (MDGP1) is the better known and forces all groups to have the same number of elements, with S=M/G. The second variant (MDGP2) allows the size S_g of each group g to be in the interval [a_g,b_g], where a_g <= b_g for g = 1,...,G. Clearly, MDGP1 is a special case of the MDGP2 for which a_g = b_g for all g.

Both variants can be formulated as quadratic integer programs with binary variables x_ig that take the value of 1 if element i is in group g and 0 otherwise. A quadratic integer programming formulation of the MDGP1 is:

The objective function adds the distance of all pairs of elements that belong to the same group. The first set of constraints forces the assignment of each element to a group. The second set of constraints forces the size of all groups to be equal to S. In the more general case, MDGP2, the second set of constraints is replaced with:

State of the Art Methods

The most relevant approximated methods developed to solve this problem are:

LCW: Multistart algorithm composed by a random constructive and an improvement method. The improvement method is developed as a modified version of LC method due to Lofti and Cerveny (1991). Weitz and Lakshminarayanan (1998)
LSGA: Hybrid genetic algorithm that combines a genetic algorithm and a local search procedure. The genetic aspect of LSGA is based on the encoding scheme for grouping problems proposed by Falkenauer (1998). The local search within LSGA implements a best improvement strategy based on exchanging elements between groups. Fan et al. (2011)
T‐LCW: Multistart algorithm composed by a greedy constructive (GC) and T-LCW, a tabu version of LCW improvement method. Gallego et al. (2011)
SO: Multistart algorithm composed by a greedy constructive (GC) and strategic oscillation improvement based on T-LCW. Gallego et al. (2011)

A Java 6 implementation of all state of the art methods is available for download as binary. For usage instructions execute .jar file with the command:

java -jar mdgp_jors_2011.jar

Please make sure that you have Java 6 SE installed in your system.

Instances (MDGPLIB)

We have compiled a comprehensive set of 480 benchmark instances representative of the collections previously used for computational experiments in the MDGP. We call this benchmark MDPGLIB. Furthermore we have included new hard instances. A brief description of the origin and the characteristics of the set of instances follows:

RanReal: This set consists of 160 M x M matrices in which the distance values d_ij are real numbers generated using a Uniform distribution U(0,100). The number of elements M, the number of groups G and the limits of each group a_g and b_g are shown in Table 1. There are 20 instances for each combination of parameters (i.e., each row in Table 1), 10 for instances with equal group size (EGS) and 10 for instances with different group size (DGS). For the 10 instances in EGS, the group size is equal for all instances and is calculated as a_g = a_g = M / G. For the 10 instances in DGS, the limits of each group (a_g and b_g) for each instance are generated randomly in the predefined interval. That is, the value of a_g is generated in the interval [a_g^min,a_g^max] and the value of b_g is generated in the interval [b_g^min,b_g^max]. This data set was introduced by Fan et al. (2011) with M ranging from 10 to 240. The larger instances with M=480 and M=960 had been generated for inclusion in MDGPLIB.

Table 1. Summary of parameters to generate problem instances
M	G	EGS	DGS
M	G	a_g = a_g	a_g^min	a_g^max	b_g^min	b_g^max
10	2	5	3	5	5	7
12	4	3	2	3	3	5
30	5	6	5	6	6	10
60	6	10	7	10	10	14
120	10	12	8	12	12	16
240	12	20	15	20	20	25
480	20	24	18	24	24	30
960	24	40	32	40	40	48

RanInt: This set has the same structure and size as RanReal (shown in Table 1) but distances are generated with an integer Uniform distribution.
Geo: This set follows the same structure and size as the previous two, however, values are calculated as Euclidean distances between pair of points with coordinates randomly generated in [0,10]. The number of coordinates for each point is generated randomly in the 2 to 21 range.

Download MDGPLIB. The format of the instances is described here

Computational Experience

We have performed a computational comparison of the state of the art methods on 240 instances (5 EGS instances and 5 DGS instances for M=10 to 960). All methods were stopped using the same time limit, which varied according to problem size, as specified in Table 2.

Table 2. CPU time limits
M	Time (seconds)
<= 60	1
120	3
240	20
480	120
960	600

In this experiment we compute for each instance and each method the relative deviation Dev (in percent) between the best solution value, Value, obtained with the method and the best known value of this instance, BestValue. For each method, we also report the number of instances #Best for which the method obtains the best value.

All algorithms were implemented in Java 6 and the experiment was performed in a Intel Core 2 Quad CPU Q 8300 with 6 GiB of RAM and Ubuntu 9.04 64 bits OS.

Method	Dev	#Best	Score
LSGA	0.61%	82	339
LCW	1.01%	80	438
T-LCW	0.17%	141	108
SO	0.04%	192	55

Download the raw data used to calculate this table.

References

Fan, Z. P., Y. Chen, J. Ma and S. Zeng (2011) "A hybrid genetic algorithmic approach to the maximally diverse grouping problem", Journal of the Operational Research Society, vol. 62, pp. 92‐99.
Feo, T. and M. Khellaf (1990) "A class of bounded approximation algorithms for graph partitioning", Networks, vol. 20, pp. 181‐195.
Lotfi V. and R. Cerveny (1991) "A final exam scheduling package", Journal of the Operational Research Society, vol. 42, pp. 205‐216.
Weitz, R. R. and S. Lakshminarayanan (1998) "An empirical comparison of heuristic methods for creating maximally diverse groups", Journal of the Operational Research Society, vol. 49, no. 6, pp. 635‐646.
Gallego, M., Laguna, M., Martí, R. and Duarte, A. (2011) "Tabu Search with Strategic Oscillation for the Maximally Diverse Grouping Problem", Journal of the Operational Research Society. Accepted. Download.

Optsicom Project

Sections