Maximally Diverse Grouping Problem
Gallego, Laguna, Martí and Duarte (2011)
Optsicom project, University of Valencia (Spain)
Problem Description
The maximally diverse grouping problem (MDGP) consists of grouping a set of M elements into G mutually disjoint groups in such a way that the diversity among the elements in each group is maximized. The diversity among the elements in a group is calculated as the sum of the individual distance between each pair of elements, where the notion of distance depends on the specific application context. The objective of the problem is to maximize the overall diversity, i.e., the sum of the diversity of all groups. Feo and Khellaf (1990) proved that the MDGP is NP-hard.
In order to formulate the MDGP in mathematical terms, we assume that each element can be represented by a set of attributes. Let s_{ik} be the state or value of the k^{th} attribute element, where k= 1,...,K and i=1,...,M. Then, the distance d_{ij} between element i and j may be simply defined by the Eucliden calculation:
We have identified two variants of the MDGP. The first one (MDGP1) is the better known and forces all groups to have the same number of elements, with S=M/G. The second variant (MDGP2) allows the size S_{g} of each group g to be in the interval [a_{g},b_{g}], where a_{g} <= b_{g} for g = 1,...,G. Clearly, MDGP1 is a special case of the MDGP2 for which a_{g} = b_{g} for all g.
Both variants can be formulated as quadratic integer programs with binary variables x_{ig} that take the value of 1 if element i is in group g and 0 otherwise. A quadratic integer programming formulation of the MDGP1 is:
The objective function adds the distance of all pairs of elements that belong to the same group. The first set of constraints forces the assignment of each element to a group. The second set of constraints forces the size of all groups to be equal to S. In the more general case, MDGP2, the second set of constraints is replaced with:
State of the Art Methods
The most relevant approximated methods developed to solve this problem are:
- LCW: Multistart algorithm composed by a random constructive and an improvement method. The improvement method is developed as a modified version of LC method due to Lofti and Cerveny (1991). Weitz and Lakshminarayanan (1998)
- LSGA: Hybrid genetic algorithm that combines a genetic algorithm and a local search procedure. The genetic aspect of LSGA is based on the encoding scheme for grouping problems proposed by Falkenauer (1998). The local search within LSGA implements a best improvement strategy based on exchanging elements between groups. Fan et al. (2011)
- T‐LCW: Multistart algorithm composed by a greedy constructive (GC) and T-LCW, a tabu version of LCW improvement method. Gallego et al. (2011)
- SO: Multistart algorithm composed by a greedy constructive (GC) and strategic oscillation improvement based on T-LCW. Gallego et al. (2011)
A Java 6 implementation of all state of the art methods is available for download as binary. For usage instructions execute .jar file with the command:
java -jar mdgp_jors_2011.jar
Please make sure that you have Java 6 SE installed in your system.
Instances (MDGPLIB)
We have compiled a comprehensive set of 480 benchmark instances representative of the collections previously used for computational experiments in the MDGP. We call this benchmark MDPGLIB. Furthermore we have included new hard instances. A brief description of the origin and the characteristics of the set of instances follows:
- RanReal: This set consists of 160 M x M matrices in which the distance values d_{ij} are real numbers generated using a Uniform distribution U(0,100). The number of elements M, the number of groups G and the limits of each group a_{g} and b_{g} are shown in Table 1. There are 20 instances for each combination of parameters (i.e., each row in Table 1), 10 for instances with equal group size (EGS) and 10 for instances with different group size (DGS). For the 10 instances in EGS, the group size is equal for all instances and is calculated as a_{g} = a_{g} = M / G. For the 10 instances in DGS, the limits of each group (a_{g} and b_{g}) for each instance are generated randomly in the predefined interval. That is, the value of a_{g} is generated in the interval [a_{g}^{min},a_{g}^{max}] and the value of b_{g} is generated in the interval [b_{g}^{min},b_{g}^{max}]. This data set was introduced by Fan et al. (2011) with M ranging from 10 to 240. The larger instances with M=480 and M=960 had been generated for inclusion in MDGPLIB.
- RanInt: This set has the same structure and size as RanReal (shown in Table 1) but distances are generated with an integer Uniform distribution.
- Geo: This set follows the same structure and size as the previous two, however, values are calculated as Euclidean distances between pair of points with coordinates randomly generated in [0,10]. The number of coordinates for each point is generated randomly in the 2 to 21 range.
M | G | EGS | DGS | ||||
---|---|---|---|---|---|---|---|
a_{g} = a_{g} | a_{g}^{min} | a_{g}^{max} | b_{g}^{min} | b_{g}^{max} | |||
10 | 2 | 5 | 3 | 5 | 5 | 7 | |
12 | 4 | 3 | 2 | 3 | 3 | 5 | |
30 | 5 | 6 | 5 | 6 | 6 | 10 | |
60 | 6 | 10 | 7 | 10 | 10 | 14 | |
120 | 10 | 12 | 8 | 12 | 12 | 16 | |
240 | 12 | 20 | 15 | 20 | 20 | 25 | |
480 | 20 | 24 | 18 | 24 | 24 | 30 | |
960 | 24 | 40 | 32 | 40 | 40 | 48 |
Download MDGPLIB. The format of the instances is described here
Computational Experience
We have performed a computational comparison of the state of the art methods on 240 instances (5 EGS instances and 5 DGS instances for M=10 to 960). All methods were stopped using the same time limit, which varied according to problem size, as specified in Table 2.
M | Time (seconds) |
---|---|
<= 60 | 1 |
120 | 3 |
240 | 20 |
480 | 120 |
960 | 600 |
In this experiment we compute for each instance and each method the relative deviation Dev (in percent) between the best solution value, Value, obtained with the method and the best known value of this instance, BestValue. For each method, we also report the number of instances #Best for which the method obtains the best value.
All algorithms were implemented in Java 6 and the experiment was performed in a Intel Core 2 Quad CPU Q 8300 with 6 GiB of RAM and Ubuntu 9.04 64 bits OS.
Method | Dev | #Best | Score |
---|---|---|---|
LSGA | 0.61% | 82 | 339 |
LCW | 1.01% | 80 | 438 |
T-LCW | 0.17% | 141 | 108 |
SO | 0.04% | 192 | 55 |
Download the raw data used to calculate this table.
References
- Fan, Z. P., Y. Chen, J. Ma and S. Zeng (2011) "A hybrid genetic algorithmic approach to the maximally diverse grouping problem", Journal of the Operational Research Society, vol. 62, pp. 92‐99.
- Feo, T. and M. Khellaf (1990) "A class of bounded approximation algorithms for graph partitioning", Networks, vol. 20, pp. 181‐195.
- Lotfi V. and R. Cerveny (1991) "A final exam scheduling package", Journal of the Operational Research Society, vol. 42, pp. 205‐216.
- Weitz, R. R. and S. Lakshminarayanan (1998) "An empirical comparison of heuristic methods for creating maximally diverse groups", Journal of the Operational Research Society, vol. 49, no. 6, pp. 635‐646.
- Gallego, M., Laguna, M., Martí, R. and Duarte, A. (2011) "Tabu Search with Strategic Oscillation for the Maximally Diverse Grouping Problem", Journal of the Operational Research Society. Accepted. Download.