PILOH – Desarrollo de una herramienta software para la resolución de problemas de ingeniería lingüística mediante optimización heurística

Principal investigator: Abraham DuarteFunding entities: URJC y Comunidad de Madrid (URJC-CM-2006-CET-0603)Duration: 01/01/2007 – 31/12/2007Abstract: 

El tratamiento y organización de la enorme cantidad de información en formato electrónico de la que se dispone actualmente se han convertido en una necesidad dentro de esta Sociedad de la Información en la que vivimos. En consecuencia, no tendría sentido disponer de grandes repositorios de información de carácter lingüístico en los que no pudiéramos extraer un conocimiento útil. Este proyecto se enmarca dentro del área de la optimización heurística aplicada a problemas del Procesamiento de Lenguaje Natural e Ingeniería Lingüística. En este proyecto se desarrollará un software para la optimización de diferentes problemas de optimización con el objetivo de dar solución a dos de los grandes problemas a los que se enfrenta actualmente el campo del Procesamiento del Lenguaje Natural: la clasificación automática y el agrupamiento, o clustering, de documentos.

Los problemas que se pretenden abordar están basados en modelos estructurados; es decir, en los que se conoce una descripción o formulación matemática completa. Se propondrán diferentes modelos de resolución eficientes basados en procedimientos metaheurísticos. Los métodos que se propongan serán comparados con los mejores métodos de resolución existentes para ese tipo de problemas, tanto en el ámbito académico como en el comercial. Se pretende que esto dé lugar, tanto a una aplicación que proporcione soluciones de gran calidad, como a publicaciones científicas de impacto internacional.

COSYO – COmplex SYstem Optimization

Principal investigator: Abraham DuarteFunding entities: URJC y Comunidad de Madrid (URJC-CM-2008-CET-3731)Duration: 01/01/2009 – 31/12/2009Abstract: 

Existe un tipo de problemas de optimización especialmente difíciles de resolver en los que se dispone sólo de información parcial, denominados Sistemas Complejos. En ellos no se tiene una descripción explícita del problema ya que algunos de sus elementos característicos, como son la función objetivo o las restricciones, se obtienen de forma indirecta. Como consecuencia, éstos se tratan como una caja negra.

El proyecto de investigación se centrará en el diseño de un Solver genérico (Context-Independent Solver) para la optimización de sistemas complejos mediante técnicas metaheurísticas. El Solver desarrollado generará soluciones como entrada a la caja negra y posteriormente analizará el resultado devuelto, extrayendo información sobre las soluciones, de tal forma que iterativamente se vayan generando soluciones de mayor calidad

Para diseñar el Solver, en primer lugar se categorizarán los problemas en función de si están descritos mediante variables enteras, permutaciones de elementos o variables continuas. Posteriormente, se diseñará un método basado en metaheurísticas para resolver cada tipo de problema. El último paso del diseño del Solver genérico consistirá en la integración de los tres métodos en un único esquema general que seleccionará el más adecuado para la resolución de cada problema. El Solver se complementará con una implementación del mismo en una herramienta denominada COSYO.

COSYO será un Solver genérico para la optimización de sistemas complejos modelados como una caja negra. Se considerarán dos perfiles de usuario de la herramienta. Por un lado, investigadores o profesionales con conocimientos de optimización (usándolo como librería de programación) y, por otro lado, profesionales que no tengan conocimientos avanzados en optimización (usándolo desde la hoja de cálculo de OpenOffice.org)

Los métodos propuestos en el desarrollo del proyecto se compararán con los mejores métodos existentes para ese tipo de problemas tanto en el ámbito académico como en el comercial. Esto dará lugar tanto a una aplicación que proporcione soluciones de gran calidad como a publicaciones científicas de impacto internacional.

MA2VICMR – Mejorando el Acceso, el Análisis y la Visibilidad de la Información y los Contenidos Multilingües y Multimedia en Red para la Comunidad de Madrid

Principal investigator: Abraham DuarteFunding entities: Comunidad de Madrid y Fondo Social Europeo (S2009/TIC-1542)Duration: 01/01/2010 – 31/12/2013Abstract: 

Los sistemas de acceso a la información multimedia que trabajan sobre colecciones de imágenes suelen tener acceso a dos tipos de datos: los descriptores textuales y el contenido visual de las imágenes. Tradicionalmente, estos sistemas han abordado o bien el problema de la recuperación de imágenes analizando la información textual asociada (Text­Based Information Retrieval, TBIR) o bien analizando el contenido visual (Content­Based Information Retrieval, CBIR). Hasta hace unos pocos años, las aproximaciones mixtas no aportaban ninguna ventaja a los resultados, además de ser bastante ineficientes.

Por un lado, investigadores de NLP&IR­UNED y del grupo de Vision Team de la Universitat de Valencia coordinaron su experiencia previa en recuperación textual y la basada en contenido de imágenes. Fruto de los trabajos de esta colaboración, ha sido una aproximación que no solo se aprovecha de la sinergia entre los aspectos visuales y de las anotaciones textuales conjuntamente, sino que además aporta un método de cálculo eficiente para la búsqueda de imágenes anotadas en grandes colecciones, a partir de una consulta multimedia, ya sea texto y una o varias imágenes. Este trabajo ha generado, además de participaciones en competiciones como ImageCLEF y MediEval, varias publicaciones en actas de congresos, un artículo en la revista IEEE Transactions on Multimedia Journal y una tesis doctoral en el grupo NLP&IR­UNED titulada Fusión Multimedia Semántica Tardía aplicada a la Recuperación de Información Multimedia.

Por otro lado, otro equipo mixto formado por integrantes de NLP&IR­UNED y GAVAB­URJC han integrado tecnologías previas para construir un sistema híbrido de búsqueda de imágenes. La propuesta, que combinaba rasgos de contenido y análisis del texto enriquecido con recursos lingüísticos como WordNet, participó en dos ediciones de la competición Photo Annotation Task de ImageCLEF.

DIETHA – Diseño, Implementación y Explotación de Técnicas Heurísticas Avanzadas

Principal investigator: Abraham DuarteFunding entities: Ministerio de Economía y Competitividad (TIN2012-35632-C02-02)Duration: 01/01/2013 – 31/12/2015Abstract: 

There are a large amount of problems that are framed within the context of combinatorial optimization characterized by the high interest associated with their practical resolution. This project deals with five distinct families of combinatorial problems. These are:

  • Ordering problems: with applications in VLSI design or in the efficient resolution of systems of equations.
  • Location problems: with interest in telecommunication applications such as distribution of signal regenerators or network design.
  • Graph-based problems: with applications in the distribution of electronic devices in electronic boards or in image segmentation.
  • Routing problems: by focusing on multi-objective problems with applications in the transport of hazardous materials or in recommendation systems.
  • Selection problems: with applications in the construction of diverse groups or clustering of documents.

The methodology to solve the problems described above are metaheuristics procedures, among which we can highlight evolutionary algorithms, tabu search, variable neighborhood search or GRASP, to name some of the best known. For each combinatorial problem, we will propose the most suitable metaheuristic according to their mathematical structure or model. We will mainly focus on design of novel strategies to obtain high quality solutions. Besides, it is expected to figure out general strategies that can be easily applied to other related problems. We will also focus on the efficient and flexible implementation of those strategies, taking advantage of new language programming features and multi-core microprocessors. Finally we will focus on exploitation through a management platform, which integrates the problems addressed above. Simultaneously, we will develop an application to put into production (in the companies interested in our research project) the algorithms developed during the project.

In addition to solve the problems presented above, a second objective of the project is to develop the metaheuristic methodologies themselves. To successfully meet this challenge the research team has researcher Nenad Mladenovic, which developed in conjunction with Pierre Hansen the variable neighborhood search methodology.

All these problems will be integrated into Optsicom, a software tool that allow the execution of algorithms devoted to solve optimization problems and analyze the associated results. Optsicom can be used at two levels: as an end-user or as a researcher on heuristic methods. In this line, the problems integrated in Optsicom will also be available via the Web at Optsicom Repository, a web platform for comprehensive management of optimization problems. This platform will post all the information associated with an optimization problem. For each problem, it is expected to store the description, algorithms, instances, experimental results and relevant references. Furthermore, the results obtained by executing the algorithms can be compared using different statistical tests that are available as part of the software tool.

Vertex Separation Problem

Duarte A., F. Escudero L., Martí R., Mladenovic N., Pantrigo J.J. and Sánchez-Oro J. (2012)
Optsicom project, University of Valencia (Spain)

Problem Description

Let G(V,E) be an undirected graph where V (n = |V|) and E (m = |E|) are the sets of vertices and edges, respectively. A linear layout φ of the vertices of G is a bijection or mapping φ : V → {1, 2, . . . , n} in which each vertex receives a unique and different integer between 1 and n. For vertex u, let φ(u) denote its position or label in layout φ. Let L(p,φ,G) be the set of vertices in V with a position in the layout φ lower than or equal to position p. Symmetrically, let R(p,φ,G) be the set of vertices with a position in the layout φ larger than position p. In mathematical terms,

Since layouts are usually represented in a straight line, where the vertex in position 1 comes first, L(p,φ,G) can be simply called the set of left vertices with respect to position p and, R(p,φ,G) the set of the right vertices w.r.t. p.

The Cut-value at position p of layout φ, Cut(p,φ,G), is defined as the number of vertices in L(p,φ,G) with one or more adjacent vertices in R(p,φ,G), then,

where N(u) = {v ∈ V : (u, v) ∈ E}. The Vertex Separation value (VS) of layout φ is the maximum of the Cut-value among all positions in layout φ: VS(φ,G) = maxp Cut(p,φ,G).

Figure 1.a shows an illustrative example of an undirected graph G with 7 vertices and 9 edges. Figure 1.b depicts a solution (layout) φ of this graph and the Cut-value of each position p, Cut(p,φ,G). For example, Cut(1,φ,G) = 1 because L(1,φ,G) = {D} and R(1,φ,G) = {A, F, G, E, B, C} and there is one vertex in Lhaving an adjacent vertex in R. Similarly, Cut(3,φ,G) = 2 where L(3,φ,G) = {D, A, F} and R(3,φ,G) = {G, E, B, C}. The objective function value, computed as the maximum of these cut values, is VS(G,φ) = 3 whose related position is p = 4.

State of the Art Methods

The decisional version of the VSP was proved to be NP-complete for general graphs (Lengauer, 1981). It is also known that the problem remains NP-complete for planar graphs with maximum degree of three (Monien and Sudborough, 1988), as well as for chordal graphs (Gustedt, 1993), bipartite graphs (Goldberg et al. 1995), grid graphs and unit disk graphs (Diaz et al. 2001).

We can find many different graph problems that, although stated in different terms, are equivalent to the VSP in the sense that a solution to one problem provides a solution to the other one. Some of them are the Path-Width problem (Kinnersley 1992), the Interval Thickness problem (Kirousis and Papadimitriou 1985), the Node Search Number (Kirousis and Papadimitriou 1986} and the Gate Matrix Layout (Kinnersley and Langston 1994). The equivalence between these problems is a consequence of the results presented in (Fellows and Langston, 1994, Kinnersley 1992 and Kirousis and Papadimitriou 1986). For any graph G let VS(G)PW(G)IT(G)SN(G) and GML(G) be the objective function value of the optimal solution for the Vertex Separation, Path-Width, Interval Thickness, Node Search Number and Gate Matrix Layout problems, respectively. These values verify the following relations:

VS(G) = PW(G) = IT(G) = SN(G) – 1 = GML(G) + 1.

The VSP appears in the context of finding “good separators” for graphs (Lipton and Tarjan 1979) where a separator is a set of vertices or edges whose removal separates the graph into disconnected subgraphs. This optimization problem has applications in VLSI design for partitioning circuits into smaller subsystems, with a small number of components on the boundary between the subsystems (Leiserson, 1980). The decisional version of the VSP consists of finding a vertex separation value larger than a given threshold. It has applications on computer language compiler design and exponential algorithms. In compiler design, the code to be compiled can be represented as a directed acyclic graph (DAG) where the vertices represent the input values to the code as well as the values computed by the operations within the code. An edge from node u to node v in this DAG represents the fact that value u is one of the inputs to operation v. A topological ordering of the vertices of this DAG represents a valid reordering of the code, and the number of registers needed to evaluate the code in a given ordering is precisely the vertex separation number of the ordering (Bodlaender et al., 1998). The decisional version of VSP has also applications in graph theory (Fomin and Hie, 2006). Specifically, if a graph has a vertex separation value, say w, then it is possible to find the maximum independent set of G in time O(2w n). Other practical applications include Graph Drawing and Natural Language Processing (Dujmovic et al., 2008, Miller, 1956).

Instances (VSPLIB)

We have experimented with three sets of instances, totalizing 173 instances:

  • HB: We derived 73 instances from the Harwell-Boeing Sparse Matrix Collection. This collection consists of a set of standard test matrices M = Muv arising from problems in linear systems, least squares, and eigenvalue calculations from a wide variety of scientific and engineering disciplines. The graphs are derived from these matrices by considering an edge (u, v) for every element Muv = 0. From the original set we have selected the 73 graphs with n ≤ 1000. The number of vertices and edges range from 24 to 960 and from 34 to 3721, respectively.
  • Grids: This set consists of 50 matrices constructed as the Cartesian product of two paths (Raspaud et al., 2009). They are also called two dimensional meshes and the optimal solution of the VSP for squared grids is known by construction, see [7]. Specifically, the vertex separation value of a square grid of size λ × λ is λ. For this set, the vertices are arranged on a square grid with a dimension λ × λ for 5 ≤ λ ≤ 54. The number of vertices and edges range from 5 × 5 = 25 to 54 × 54 = 2916 and from 40 to 5724, respectively.
  • Trees: Let T(λ) be set of trees with minimum number of nodes and vertex separation equal to λ. As it is stated in Ellis et al. (1994), there is just one tree in T(1), namely the tree with a single edge, and another one in T(2), the tree constructed with a new node acting as root of three subtrees that belong to T(1). In general, to construct a tree with vertex separation λ + 1 it is necessary to select any three members from T(λ) and link any one node from each of these to a new node acting as the root of the new tree. The number of nodes, n(λ), of a tree in T(λ) can be obtained using the recurrence relation n(λ) = 3n(λ – 1 ) + 1 where and n(1) = 2 (see Ellis et al. (1994) for additional details). We consider 50 different trees: 15 trees in T(3), 15 trees in T(4) and 20 trees in T(5). The number of vertices and edges range from 22 to 202 and from 21 to 201, respectively.

The VSPLIB contains 173 instances:

Download Vertex Separation Instances (VSPLIB).


Computational Experience

All the algorithms were implemented in Java SE 6 and the experiments were conducted on an Intel Core i7 2600 CPU (3.4 GHz) and 4 GB RAM. We have considered the VSPLIB instances described above. The individual results for each instance can be downloaded here in Excel format.

Vertex Separation results


References

  • Bodlaender HL, Gustedt J, Telle JA. Linear-time register allocation for a fixed number of registers. Proc. of teh Symposium on Discrete Algorithms, 1998; 574–583.
  • Díaz J., Penrose M.D., Petit J., Serna M. Approximating layout problems on random geometric graphs. Algorithms 2001; 39(1):78–116.
  • Dujmovic V, Fellows MR, Kitching M, Liotta G, Mccartin K, Nishimura N, Ragde P, Rosamond FA, Whitesides S, Wood DR. On the Parameterized Complexity of Layered Graph Drawing. Algorithmica2008; 52(2):267–292.
  • Ellis J.A., Sudborough I.H., Turner J.S.. The vertex separation and search number of a graph. Journal Information and Computation, 1994; 113:50–79.
  • Fellows MR, Langston MA. On search, decision and the efficiency of polynomial-time algorithms. Journal of Computing Systems Sciences 1994; 49(3):769–779
  • Fomin FV, Hie K. Pathwidth of cubic graphs and exact algorithms. Information Processing Letters 2006; 97:191–196.
  • Goldberg P.W., Golumbic M.C., Kaplan H., Shamir R. Four strikes against physical mapping of DNA. Journal of Computing Biology 1995; 12(1):139–152.
  • Gustedt J. On the pathwidth of chordal graphs. Discrete Applied Mathemastics 1993; 45(3):233–248.
  • Kinnersley N.G. The vertex separation number of a graph equals its path-width. Information Processing Letters 1992; 42(6):345–50.
  • Kinnersley NG, Langston MA. Obstruction set isolation for the gate matrix layout problem. Discrete Applied Mathematics 1994; 54(2-3):169–213.
  • Kirousis M, Papadimitriou C.H. Interval graphs and searching. Discrete Mathematics 1985; 55(2):181–184.
  • Kirousis M, Papadimitriou C.H. Searching and pebbling. Theory of Computation Sciences 1986; 47(2):205–218.
  • Leiserson CE. Area-Efficient Graph Layouts (for VLSI). Proc. of IEEE Symposium on Foundations of Computer Science, 1980; 270–281.
  • Lengauer T. Black-White Pebbles and Graph Separation. Acta Informatica 1981; 16:465–475.
  • Lipton RJ, Tarjan RE. A separator theorem for planar graphs. SIAM Journal of Applied Mathematics1979; 36:177–189.
  • Miller GA. The Magical Number Seven, Plus or Minus Two. SIAM Journal of Applied Mathematics 1956; 13:81–97.
  • Monien B., Sudborough I.H. Min Cut is NP-Complete for Edge Weighted Treees. Theory of Computatiuon Sciences 1988;58:209–229.
  • Raspaud A., Schröder H., Sýkora O., Török L., and Vrt’o I. Antibandwidth and cyclic antibandwidth of meshes and hypercubes. Discrete Mathematics, 2009; 309:3541–3552.

Linear Ordering Problem

Martí, Reinelt and Duarte (2009)
University of Valencia (Spain), University of Heidelberg (Germany) and University Rey Juan Carlos (Spain)

Problem Description

Let Dn = (VnAn) denote the complete digraph on n nodes, where Vn is the set of nodes and An the set of arcs. A tournament T in An consists of a subset of arcs containing for every pair of nodes i and j either arc (ij) or arc (ji), but not both. T is an acyclic tournament if it does not contain any directed cycle. Obviously, an acyclic tournament induces an ordering < vi1vi2,…,vin > of the nodes (and vice versa). Node vi1 is the one with no entering arcs in Tvi2 has exactly one entering arc, etc., and vin is the node with no outgoing arc. Given arc weights wij for every pair ij in Vn, the linear ordering problem (lop) consists of finding an acyclic tournament T in An such that the sum of the weights of arcs in T is maximal, or in other words, of finding an ordering of the nodes such that the sum of the weights of the arcs compatible with this ordering is maximal.

Given an (nn) matrix C = (cij) the triangulation problem is to determine a simultaneous permutation of the rows and columns of C such that the sum of superdiagonal entries becomes as large as possible (or equivalently, the sum of subdiagonal entries is as small as possible). Note, that it does not matter if diagonal entries are taken into account or not. Obviously, by setting arc weights wij = cij for the complete digraph Dn, the triangulation problem forC can be solved as linear ordering problem Dn. Conversely,a linear odering problem for Dn can be transformed to a triangulation problem for an (nn) matrix C by setting cij = wij and the diagonal entries cii = 0 (or to arbitrary values).

The LOP can be formulated as 0/1 linear integer programming problem as follows. We use 0/1 variables xij, for (ij) inAn, stating whether arc (ij) is present in the tournament or not. Taking into account that a tournament is acyclic if and only if it does not contain any dicycle of length 3, it is easily seen that the LOP can be formulated as the 0/1-IP.

State of the Art Methods

The most relevant metaheuristics developed to solve this problem are:

  • TS: Tabu Search. Laguna et al (1999)
  • MA: Memetic Algorithm. Schiavinotto and Stützle (2004)
  • VNS: Variable Neighbourhood Search. García et al (2006)
  • SA: Simulated Annealing. Charon and Hudry (2007)
  • SS: Scatter Search. Campos et al (2001)
  • GRASP: Greedy ramdomized adaptive search procedure. Campos et al (2001)

Instances (LOLIB)

We have compiled a comprehensive set of benchmark problems including all problem instances which have so far been used for conducting computational experiments for the \lop. Furthermore we have included new instances. In their original definition, some problem instances are not in normal form. For the computations documented here, all problems have been transformed to normal form. We give a brief description of the origin and the characteristics of the groups of problems.

  • Input/Output matrices: This is a well-known set of instances that contains 50 real-world linear ordering problems generated from input-output tables from various sources (Grötschel et al 1984). They are comparatively easy for nowadays metaheuristics and are thus more of interest for economists than for the assessment of approximate methods for hard problems. The original entries in these tables were not necessarily integral, but for \LOLIB\ they were scaled to integral values. Download.
  • SGB instances: These instances are taken from the Stanford GraphBase and consist of input-output tables from sectors of the economy of the United States (Knuth 1993). The set has a total of 25 instances with 75 sectors. Download.
  • Random instances of type A: This is a set with 175 random problems that has been widely used for experiments. Problems of type I (called RandomAI), are generated from a [0,100] uniform distribution. This type of problems was proposed in Reinetl (1985) and generated in Campos et al (2001). Problems were originally generated from a [0, 25000] uniform distribution in Laguna et al (1999) and modified afterwards, sampling from a significatively narrow range ([0,100]) to make them harder to solve. Sizes are n = 100, 150 and 200 and there are 25 instances in each set giving a total of 75. We have extended this set including 25 additional instances with sizen = 500. Download. Problems of type II, which we call RandomAII, are generated by counting the number of times a sector appears in a higher position than another in a set of randomly generated permutations. This type of problems was proposed in Chanas and Kobylanski (1996) and generated in Campos el al (2001). For a problem of size nn/2 permutations are generated. There are 25 instances with sizes 100, 150 and 200, respectively. Download.
  • Random instances of type B: For these random problems, the superdiagonal entries are drawn uniformly distributed from the interval [0,U1] and the subdiagonal entries from [0,U2], where U1 >=e U2Download.
  • Instances of Mitchell and Borchers: These instances have been used by Mitchell and Borchers for their computational experiments (Mitchell and Borchers, 2000). They are random matrices where the subdiagonal entries are uniformly distributed in [0,99] and the superdiagonal entries are drawn uniformly from [0,39]. Furthermore a certain percentage of the entries was zeroed out. Download.
  • Instances of Schiavinotto and Stützle: Some further benchmark instances have been created and used by Schiavinotto and Stützle (2004). These instances were generated from the real-world input-output tables by replicating them to obtain larger problems. Thus, the distribution of numbers in these instances somehow reflects real input-output tables, but otherwise they behave more like random problems. Tha data set has been called XLOLIB, instances with n = 150 and n = 250 are available. For each original input-output instance, two instances, one of size n =150 and another one of size n = 250 were generated. Therefore this set contains 98 instances (49 with size 150 and 49 with size 250). We have removed 20 of these instances because there entries were so large that the sum of entries was not representable as 4-byte integer. Therefore, this set finally has 78 instances. Download.
  • Further special instances: We added some further problem instances that were used for experiments in some publications. EX instances were used in particular in Christof (1997) and in Christof and Reinelt (1996).econ instances were generated from the matrix usa79. They turned out not to be solvable as linear program using only 3-dicycle inequalities. atp instances were created from the results of ATP tennis tournaments in 1993/1994. Nodes correspond to a selection of players and the weight of an arc (ij) is the number of victories of player i against player j. Paley graphs have been used in Goemans and Hall (1996) to prove results about the acyclic subdigraph polytope. They are a special class of tournaments where adjacency comes from an algebraic definition. They are constructed from the members of a suitable finite field by connecting pairs of elements that differ in a quadratic residue. Download.

Computational Experience

We have performed an intensive experimentation with the best know methods (run for 2 hours) to compute the best known values for the instances. We have also reviewed related literature searching for better values than these. The final best known values for the instances can be downloaded here.

We performed a computational comparison of the state of the art methods on the instances. We have considered two different time limits in our comparative: 10 seconds (to measure the aggressiveness) and 10 minutes (to measure the robustness). Both experiments were performed in a Dual Intel Xeon at 3.06 GHz with 3.2 GiB of RAM. It can be downloaded in Excel.


References

  • Campos, V., Glover, F. Laguna, M. and Martí, R.: An experimental evaluation of a scatter search for the linear ordering problem, Journal of Global Optimizationz 21 (2001), 397–414
  • Chanas, S. and Kobylanski, P.: A new heuristic algorithm solving the linear ordering problem, Computational Optimization and Applications 6 (1996), 191–205
  • Charon, I. and Hudry, O.: A survey on the linear ordering problem for weighted or unweighted tournaments, 4OR 5 (2007), 5–60.
  • Christof, T.: Low-dimensional 0/1-polytopes and branch-and-cut, Combinatorial Optimization,Shaker, 1997
  • Christof, T. and Reinelt, G.: Combinatorial optimization and small polytopes, Top 4 (1996), 1-64
  • García, C.G., Pérez-Brito, D., Campos, V. and Martí, R.: Variable neighborhood search for the linear ordering problem, Computers and Operations Research 33 (2006),3549–3565
  • Goemans, M.X.and Hall, L.A.: The strongest facets of the acyclic subgraph polytope are unknown,Proc. of the 5th Int. IPCO Conference, LNCS 1084, Springer, 1996, 415-429
  • Grötschel, M., Jünger, M. and Reinelt, G.: A cutting plane algorithm for the linear ordering problem, Operations Research 32 (1984), 1195–1220.
  • Knuth, D.E.: The Stanford GraphBase: a platform for combinatorial computing, Addison-Wesley, 1993.
  • Laguna, M., Martí R. and Campos, V.: Intensification and diversification with elite tabu search solutions for the linear ordering problem, Computers and Operations Research 26 (1999), 1217–1230.
  • Mitchell, J.E., Borchers, B: Solving linear ordering problems, in: Frenk, H., Roos, K., Terlaky, T.Zhang, s. (eds.), High Performance Optimization, Applied Optimization Vol. 33) Kluwer, 2000, 340–366
  • Reinelt, G.: The linear ordering problem: algorithms and applications, research and exposition in mathematics 8, Heldermann, 1985.
  • Schiavinotto, T. and Stützle, T.: The linear ordering problem: Instances, search space analysis and algorithms, Journal of Mathematical Modelling and algorithms 3 2004, 367–402.

Cutwidth Minimization Problem

Martí R., Pantrigo J.J., Duarte A. and Pardo E.G. (2010)
Optsicom project, University of Valencia (Spain)

Problem Description

The Cutwidth Minimization Problem (CMP) is an NP-hard problem (Gavril 1977) and consists of finding a linear layout of a graph so that the maximum linear cut of edges (i.e., the number of edges that cut a line between consecutive vertices) is minimized. The cutwidth minimization problem can be easily described in mathematical terms. Given a graph G=(V,E) with n=|V| and m=|E|, a labeling or linear arrangement f of G assigns the integers {1,2,…,n} to the vertices in V, where each vertex receives a different label. The cutwidth of a vertex v with respect to fCWf(v), is the number of edges (u,w) ∈ Esatisfying f(u)≤f(v)<f(w). The cutwidth of the graph, CWf(G), is the maximum of the cutwidth of its vertices:

Figure 1.a shows an example of an undirected graph with 6 vertices and 7 edges. Figure 1.b shows a labeling, f, of the graph in Figure 1.a, setting the vertices in a line with the order of the labeling, as commonly represented in the CMP. In this way, since f(C)=1, vertex C comes first, followed by vertex A(f(A)=2) and so on. We represent f with the ordering (C, A, D, E, B, F) meaning that vertex C is located in the first position (label 1), vertex A is located in the second position (label 2) and so on. In Figure 1.b, the cutwidth of each vertex is represented as a dashed line with its corresponding value. For example, the cutwidth of vertex C is CWf(C) = 1, because the edge (C,B) has and endpoint in C labeled with 1 and the other endpoint in a vertex labeled with a value larger than 1. In a similar way, we can compute the cutwidth of vertex ACWf(A)=4, by counting the appropriate number of edges ((C,B), (A,B), (A,E), and (A,D)). Then, since the cutwidth of the graph GCWf(G), is the maximum of the cutwidth of all vertices in V, in this particular example we obtain CWf(G)=CWf(D)=5.

The CMP can be formulated as 0/1 linear integer programming problem as follows (Luttamaguzi et al., 2005):

where xik is a decision binary variable whose indices are i, k ∈ {1,2,…,n}. This variable specifies whether i is placed in position k in the ordering. In other words, for all xik (i, k={1, 2,…,n})they take on value 1 if and only if i occupies the position k in the ordering; otherwise xik takes on value 0. Constraints (3) and (4) ensure that each vertex is only assigned to one position and one position is only assigned to one vertex respectively. Consequently, contraints (1), (2), (3) and (4) together implies that a solution of the problem is an ordering.

The decision binary variable yikjl is defined as xik ∧ xjl, where i, j ∈ {1,2,…,n}, (vivj) ∈ E and k, l ∈ {1,2,…,n} the labels associated to vertex vi and vj respectively. In the linear formulation above this conjunction is computed with constraints (5), (6) and (7).

Constraint (8) computes for each position c in the ordering, the number of edges whose origin is placed in any position k (1 ≤ k < c) and destination in any position l (c < l ≤ n). The cutwidth problem consists of minimizing the maximum number of cutting edges in any position, c ∈{1,…,n – 1} of the labeling. Therefore, the objective function b must be larger than or equal to this quantity.

State of the Art Methods

The most relevant heuristic methods developed to solve this problem are:

  • GRASP+Path Relinking: Hybrid method that combines GRASP methodology with Path-Relinking. Andrade and Resende (2007a).
    • GRASP+Evolutionary Path Relinking: Hybrid method that combines GRASP methodology with Evolutionary Path-Relinking. Andrade and Resende (2007b).
    • Simulated Annealing : Simulated Annealing based on different constructive methods and local search procedures. Cohoon and Sahni (1987)
    • Scatter Search : Scatter Search method based on a GRASP constructive algorithm, a local search strategy based on insertion moves and voting-based combination methods. Pantrigo et al. (2012).
    • Tabu Search : A Tabu Search method used to help a Branch-and-bound algorithm. Palubeckis and Rubliauskas (2012).
    • Variable Neighbourhood Search : A new variant of the Variable Neighbourhood Search framework called Variable Formulation Search for the Cutwidth Minimization Problem. Pardo et al. (2013). (Results)

Instances (CMPLIB)

We have compiled three sets of instances for our experimentation, totalizing 252 instances. The first one, Small, was introduced in Martí et al. (2008), the second one, Grids, was introduced in Rolim et al. (1995) and the third one, Harwell-Boeing, is a subset of the public-domain Matrix Market library (available at http://math.nist.gov/MatrixMarket/data/Harwell-Boeing/). Next, it can be found a brief description of the origin and characteristics of the sets of instances:

  • Small: This data set consists of 84 graphs introduced in the context of the bandwidth reduction problem. The number of vertices ranges from 16 to 24, and the number of edges ranges from 18 to 49.
  • Grids: This data set consists of 81 matrices constructed as the Cartesian product of two paths (Raspaud et al., 2009). They are also called two dimensional meshes and, as documented in Raspaud et al. (2009), the optimal solution of the cutwidth problem for these types of instances is known by construction. For this set of instances, the vertices are arranged on a grid with a dimension width x height where width, height are selected from the set {3, 6, 9, 12, 15, 18, 21, 24, 27}.
  • Harwell-Boeing: We derived 87 instances from the Harwell-Boeing Sparse Matrix Collection. This collection consists of a set of standard test matrices M = (Mij) arising from problems in linear systems, least squares, and eigenvalue calculations from a wide variety of scientific and engineering disciplines. Graphs are derived from these matrices by considering an edge (i,j) for every element Mij ≠ 0. From the original set we have selected the 87 graphs with n ≤ 700. Their number of vertices ranges from 30 to 700 and the number of edges from 46 to 41686.

The CMPLIB contains 252 instances:

Download Cutwidth Instances (CMPLIB).

Computational Experience

We have performed an intensive experimentation with the best know methods to compute the best known values for the instances. In addition, we have performed a computational comparison of the state of the art methods on the reported instances. The experiment was performed in a Intel Core 2 Quad CPU Q 8300 with 6 GiB of RAM and Ubuntu 9.04 64 bits OS. Results and best known values for instances can be reviewed in Pantrigo et al. (2010) and Martí et al. (2010) and can be downloaded here.

References

  • Andrade, D.V. and M.G.C. Resende. GRASP with path-relinking for network migration scheduling. Proceedings of International Network Optimization Conference, 2007a.
  • Andrade, D.V. and M.G.C. Resende. GRASP with evolutionary path-relinking.Proceedings of Seventh Metaheuristics International Conference (MIC), 2007b.
  • Cohoon, J. and S. Sahni. Heuristics for the Board Permutation Problem.Journal of VLSI and Computer Systems, 2, 37- 61, 1987.
  • Gavril, F. Some NP-complete problems on graphs. In Proceedings of the 11th conference on information Sciences and Systems, 91-95, 1977.
  • Luttamaguzi, J., M. Pelsmajer, Z. Shen and B. Yang. Integer Programming Solutions for Several Optimization Problems in Graph Theory. Technical report, DIMACS, 2005.
  • Martí, R., V. Campos and E. Piñana, Branch and Bound for the Matrix Bandwidth Minimization. European Journal of Operational Research, 186:513-528, 2008.
  • Martí, R., J.J. Pantrigo, A. Duarte and E.G. Pardo. Branch and Bound for Cutwidth Minimization Problem. Computers & Operations Research. Volume 40, Issue 1, Pages 137–149, 2013
  • Palubeckis, G., D. Rubliauskas. A branch-and-bound algorithm for the minimum cut linear arrangement problem. Journal of Combinatorial Optimization. Volume 24, Issue 4, Pages 540-563, 2012
  • Pantrigo, J.J., R. Martí, A. Duarte and E.G. Pardo. Scatter Search for the Cutwidth Minimization Problem Annals of Operations Research. Volume 199, Issue 1, Pages 285-304, 2012.
  • Pardo, E.G., Mladenovic N., Pantrigo, J.J., Duarte A. Variable Formulation Search for the Cutwidth Minimization Problem. Applied Soft Computing. Volume 13, Issue 5, Pages 2242–2252, 2013.
  • Raspaud, A., H. Schröder, O. Sýkora, L. Török, and I. Vrt’o. Antibandwidth and cyclic antibandwidth of meshes and hypercubes. Discrete Mathematics. 309:3541-3552, 2009.
  • Rolim, J., O. Sýkora and I. Vrt’o. Cutwidth of the de Bruijn graph. RAIRO Informatique Théorique et Applications, 29(6):509-514, 1995.

Bandwidth Coloring Problem

Martí, Gortázar and Duarte (2009)
Optsicom project, University of Valencia (Spain)

Problem Description

The bandwidth coloring problem consists of assigning a color to each vertex of a graph, so that the absolute value of the difference between the colors of adjacent vertices is at least the value of the weight of the associated edge. This problem generalizes the classical vertex coloring problem.

Let G=(V,E) be a graph with a vertex set V (|V| = n) and an edge set E (|E|= m) with a strictly positive integer weight dij associated to each edge (ij) ∈ E. A k-coloring c of G labels each vertex i ∈ V with an integer c(i) ∈ {1, 2, …, k} (called color) in such a way that |c(i) – c(j)| ≥ dij for all (ij) ∈ E. The bandwidth coloring problem (BCP) consists of finding a k coloring with the smallest value of k. In mathematical terms:

The classical vertex coloring problem (VCP) is a particular case of the BCP in which dij=1 for all (ij) ∈ E. The BCP is therefore NP-hard since it generalizes the VCP (Garey and Johnson 1979).

State of the Art Methods

The most relevant approximated methods developed to solve this problem are:

  • FCNS: Constructive method that performs forward checking to avoid forbidden colorings. Combines a local search with a constraint propagation algorithm. Prestwich (2002)
  • Multistart: A multi-start method that first generates a sequence of nodes and then produces a solution of the BCP with a greedy algorithm. Then, it tries to improve this solution by reducing the number of colors used by one unit below the number of colors in the best solution known so far. Lim et al. (2005)
  • DSATUR+T1: A combination of evolutionary and tabu search methodologies. Malaguti and Toth (2008)
  • AMP: GRASP algorithm with an ejection chain improvement method that uses memory structures. Martí, Gortázar and Duarte (2009)

Instances (BCPLIB)

The instance set GEOM contains the instances reported in most of the previous BCP papers. GEOM consists of 33 geometric graphs generated by Michael Trick. In these graphs, points are generated in a 10,000 by 10,000 grid and are connected by an edge if they are close enough together. Edge weights are inversely proportional to the distance between nodes. This set contains three types of graphs. The GEOMn instances are sparse, the GEOMna and GEOMnb instances are denser.

These instances can be downloaded in a bundle.

Computational Experience

We performed a computational comparison of the state of the art methods. We have implemented the methods in Java SE 6 and the experiment was conducted on a Pentium 4 computer at 3 GHz with 2 GB of RAM. It can be downloaded in Excel.

References (Best Methods)

  • Garey, M.R. and D.S. Johnson (1979) Computers and Intractability. A guide to the theory of completeness, W. H. Freeman and Company, New York.
  • Prestwich, S. (2002) “Constrained bandwidth multicoloration neighborhoods” Computational symposium on graph coloring and its generalizations, Ithaca, NY, 126-133.
  • Lim, A., Y. Zhu, Q. Lou and B. Rodrigues (2005) “Heuristic Methods for Graph Coloring Problems,” The 20th Annual ACM Symposium on Applied Computing (SAC 2005, Santa Fe, New Mexico) March 13-17.
  • Malaguti, E. and P. Toth (2008) “An evolutionary approach for bandwidth multicoloring problems”,European Journal of Operational Research 189, 638-651.
  • Martí, R., F. Gortázar and A. Duarte (2009) “Heuristics for the Bandwidth Coloring Problem”,International Journal of Metaheuristics, to be published.

Antibandwidth Maximization Problem

Duarte, Martí Resende and Silva (2010)

Lozano, Duarte, Gortázar, Martí (2012)
Optsicom project, University of Valencia (Spain)

Problem Description

Let G = (VE) be an undirected graph, where V denotes the set of vertices and E the set of edges. Let n= |V| and m = |E|. A labeling f of the vertices of G is a one-to-one mapping from the set V onto the integers {1, 2, . . . , n}, i.e. each vertex v in V has a unique label f(v) in {1, 2, . . . , n}. Note that each labeling can be identified with a permutation of {1, 2, . . . , n}.

In some practical problems it is interesting to maximize the minimum difference between adjacent vertices. For example, if the vertices of the graph G represent sensitive facilities or chemicals, then placing them too close together can be risky. This optimization problem is dual to the well-known bandwidth problem and it is commonly known as the antibandwidth problem, also known as the separation problem (Leung et al., 1984) and the dual bandwidth problem (Yixun and Jinjiang, 2003).

Given a graph G and a labeling f, the antibandwidth ABf(G) of f is defined as:

ABf(G) = min{ABf(v) : v in V}

where ABf(v) = min{|f(u) − f(v)| : u in N(v)} is the antibandwidth of vertex v and N(v is the set of its adjacent vertices. Then, the Antibandwidth reduction problem consists in maximizing the value of ABf(G) over all possible labelings

State of the Art Methods

Previous papers on the antibandwidth problem where devoted to the theoretical study of its properties to find optimal solutions for special cases. In the variant studied in Leung et al. (1984), the problem consists in finding a labeling f with a value ABf(G) greater than a predetermined value k. The authors determined the NP-completeness of this problem and give polynomial time algorithms for several classes of graphs. Yixun and Jinjiang (2003) proposed several upper bounds for AB(G). Some of them are related to the independent and chromatic numbers of G. The bound UB1 is computed as a function of the minimum degree (mind) and maximum degree (maxd) of G.

where mind = min |N(u)| and maxd = max |N(u)| with u int V.

A weaker upper bound, UB2, is computed as a function of the number n of vertices and the number m of edges:

Raspaud et al. (2009) solve the antibandwidth problem for several classes of special graphs: two dimensional meshes (Cartesian product of two paths), tori (Cartesian product of two cycles), and hypercubes. Torok and Vrto (2007) extended these results to the case of three dimensional meshes. More recently, Dobrev et al. (2009) derive tight upper bounds for general Hamming graphs and optimal solution values for a special class of these graphs. Until now, no heuristic has been proposed to obtain high-quality solutions for the antibandwidth problem on general graphs.

Instances

In this section several instance sets are described. These intance set have been used in the related literature to compare the performance of methods for the Antibandwidth Problem:

  • Harwell-Boeing Graphs: This data set consists of 24 instances (12 smaller and 12 larger) derived from the Harwell-Boeing Sparse Matrix Collection. Graphs are derived from these matrices as follows. Let Mij denote the element of the i-th row and j-th column of the n×n sparse matrix M. The corresponding graph has n vertices. Edge (ij) exists in the graph if and only if Mij is not equal to 0. These instances were used in Duarte et al. (2010) with name Harwell-Boeing. Download.
  • Grid Graphs: This data set consists of 24 (12 smaller and 12 larger) matrices constructed as the Cartesian product of two paths (Raspaud et al., 2009). They are also called twodimensional meshes and, as documented in Raspaud et al. (2009), the optimal solutions of the antibandwidth problem for these types of instances are known by construction. These instances were used in Duarte et al. (2010) with name Grid Graphs. Download.
  • Hamming Graphs: This data set consists of 24 graphs (12 smaller and 12 larger) constructed as the Cartesian product of d complete graphs Knk, for k = 1, 2, . . . , d (Dobrev et al.,2009). The vertices in these graphs are d-tuples (i1i2, . . . , id), where ik in {0, 1, . . . , nk−1}. Two vertices (i1i2, . . . , id) and (j1j2, . . . , jd) are adjacent if and only if the two tuples differ in exactly one coordinate. These graphs as referred to as Hamming graphs. It is shown in Dobrev et al. (2009) that if d ≥ 2 and 2 ≤ n1 ≤ n1 ≤ · · · ≤ nd, then the optimal solution of the antibandwidth problem for this type of instance can be calculated. These instances were used inDuarte et al. (2010) with name Grid Graphs. Download.
  • Paths: This data set consists of 24 graphs constructed as a linear arrangement of vertices such that every vertex has a degree of two, except the first and the last vertex that have a degree of one. The size of these instances ranges from 50 to 1000. Download.
  • Cycles: This data set consists of 24 graphs constructed as a circular arrangement of vertices such that every vertex has a degree of two. The size of these instances ranges from 50 to 1000. Download.
  • Toroidal grids: This data set consists of 37 graphs constructed as the Cartesian product of two cycles (i.e., C n × C n ). The size of these instances ranges from 16 to 1600. Download.
  • Hypercubes: This data set consists of seven hypercubes Q d . Each vertex is represented as a binary vector of length d. Two vertices are adjacent if their associated binary vectors only differ in one element. Download.
  • Complete binary trees: This data set consists of 24 trees, where every tree-level is completely filled except possibly the last level and all nodes are as far left as possible. The size of these instances ranges from 30 to 950. Download.
  • Caterpillars: This data set consists of 40 graphs. Each caterpillar, P n 1 n 2 is constructed using the path P n 1 and n 1 copies of the path P n 2 (usually referred to as “hairs”), where each vertex i in P n 1 is connected to the first vertex of the i-th copy of the path P n 2 . The size of these instances ranges from 20 to 1000. Download.
  • 3D grids: This data set consists of 8 graphs constructed as the Cartesian product of three paths P n 1 , P n 2 and P n 3 (Raspaud et al. 2009). The size of these instances ranges from 27 to 1000. They are also called three dimensional meshes or cuboidal meshes. Download.

Computational Experience

We have performed an intensive experimentation with the best known heuristic to compute the best known values for the instances. We have also reviewed related literature searching for better values than these. The final best known values for the instances can be downloaded in Excel.

References

  • Dobrev, S., R. Kralovic, D. Pardubska, L. Torok, and I. Vrto. Antibandwidth and cyclic antibandwidth of Hamming graphs. Electronic Notes in Discrete Mathematics, 34:295–300, 2009.
  • Duarte, A. and R. Martí, M.G.C. Resende and R.M.A. Silva. GRASP with Path Relinking for the Antibandwidth Problem”, Networks. In press, 2010.
  • Lozano, M., Duarte, A., Gortázar, F., and Martí, R. “Variable neighborhood search with ejection chains for the antibandwidth problem”, Journal of Heuristics, 18(6), 919–938, 2012.
  • Donnelly, S., and G. Isaak. Hamiltonian powers in threshold and arborescent comparability graphs.
  • Discrete Mathematics, 202:33–44, 1999.
  • Leung, J.Y.-T., O. Vornberger, and J.D. Witthoff. On some variants of the bandwidth minimization problem. SIAM J. on Computing, 13:650–667, 1984.
  • Raspaud, A.,, H. Schroder, O. Sykora, L. Torok, and I. Vrto. Antibandwidth and cyclic antibandwidth of meshes and hypercubes. Discrete Mathematics, 309: 3541–3552, 2009.
  • Torok, L., and I. Vrto. Antibandwidth of 3-dimensional meshes. Electronic Notes in Discrete Mathematics, 28:161–167, 2007.
  • L. Yixun and Y. Jinjiang. The dual bandwidth problem for graphs. J. of Zhengzhou University, 35:1–5, 2003.

Vertex Bisection Problem

Herrán A., Colmenar J. M., Duarte A. (2018)
Optsicom project, University of Valencia (Spain)

Problem Description

The Vertex Bisection Problem (VBP) belongs to the family of well-known graph partitioning problems, where the main goal is to find a partition of the vertices maximizing or minimizing a given objective function. In particular, VBP consists of dividing a graph into two parts B and B’ with the same number of vertices, or differing in one if the number of vertices is odd, such as the vertex width (VW), defined as the number of vertices in B with at least one adjacent in B’, is minimized. As example, Figure (a) shows a connected undirected graph with 10 vertices and 13 edges. Figures (b) and (c) show two feasible solutions for the VBP with a vertex width of 2 and 3 respectively.

More formally, given a graph G = (V,E) where V is the set of vertices, |V| = n, and E is the set of edges, a feasible solution of the VBP is defined by two subsets of nodes B and B’ (with B ∩ B’ = ∅ and B ∪ B’ = V) where the size of each subset verifies that B = ⌊n/2⌋ and B’ = n – ⌊n/2⌋, respectively. Hence, given a solution S = (B, B’), its vertex width is computed as:

Let 𝒮 be the whole set of feasible partitions of V according to the aforementioned constraints. Therefore, the VBP consists of finding the solution S* ∈ 𝒮 that minimizes the vertex width.

State of the Art Methods

This problem has been dealt with from both, exact and heuristic perspectives. Among the exact methods, the most relevant works are:

  • B&B: Branch & Bound algorithm for the two proposed integer-linear programming models. Fraire (2014).
  • QCQP: Quadratically constrained quadratic program for a different integer-linear programming model than the proposed in (Fraire, 2014). Jain (2016a).
  • B&B: Branch & Bound algorithm that incorporates a greedy heuristic to produce an upper bound for the cost of the solutions. Jain (2016b).

The most relevant heuristic methods developed to solve this problem are:

  • CH: Different constructive heuristics. Castillo-Garcia (2015).
  • GA: Genetic algorithm. Huacuja (2016).
  • MA: Memetic algorithm. Pallavi (2016).

Instances

Our experimental experience is developed on several sets of instances with different number of vertices, edges, and densities. These sets contain Small-Full (84), Trees (15), Grids (5) and HB (4) instances used in Fraire (2014), and the Small (23), Harwell-Boeing (36) and Hypercubes (8) instances used in (Pallavi, 2016).

We stored the resulting 175 instances as a library that we called VBP_LIB:

Download VBP_LIB instances.


Computational Experience

All the algorithms were implemented in C++ and the experiments were conducted on an Intel i7 6500U processor running at 2.5 GHz with 8 Gb of RAM using Windows 10. We have considered the VBP_LIB instances described above. The individual results for each instance can be downloaded here in Excel format.

Vertex Bisection problem results

References

  • H. Fraire, J. Terán-Villanueva, N.C. García, J. Gonzalez-Barbosa, E.R. del Angel, Y. Gómez-Rojas. Exact Methods for the Vertex Bisection Problem. Springer International Publishing (2014), pp. 567–577.
  • P. Jain, G. Saran, K. Srivastava. A new integer linear programming and quadratically constrained quadratic programming formulation for vertex bisection minimization problem. J. Autom. Mobile Rob. Intell. Syst. 10(1) (2016a), pp. 69–73.
  • P. Jain, G. Saran, K. Srivastava. Branch and bound algorithm for vertex bisection minimization problem. In: R.K. Choudhary, J.K. Mandal, N. Auluck, H.A. Nagarajaram (Eds.), Advanced Computing and Communication Technologies, Proceedings of the 9th ICACCT (2015), Springer Singapore, Singapore (2016b), pp. 17–23.
  • N. Castillo-García, H.J.F. Huacuja, J.A.M. Flores, R.A.P. Rangel, J.J.G. Barbosa, J.M.C. Valadez. Comparative Study on Constructive Heuristics for the Vertex Separation Problem. Springer International Publishing, Cham (2015), pp. 465–474.
  • H.J.F. Huacuja, N. Castillo-García. Optimization of the vertex separation problem with genetic algorithms. IGI Global (2016), pp. 13–31.
  • J. Pallavi, G. Saran, K. Srivastava. On minimizing vertex bisection using a memetic algorithm. Inf. Sci. 369 (2016), pp. 765–787.