# EMIGO – Metaheurísticas Eficientes para la Optimización en Grafos

Principal investigator: Abraham Duarte y José Manuel Colmenar Verdugo. Funding entities: Ministerio de Ciencia Innovación y Universidades. (Ref. PGC2018-095322-B-C22). Duration: 01/01/2019 – 31/12/2021.

Information systems are nowadays commonly represented with a graph, which makes them easier to interpret and understand. A graph is an abstract data structure consisting of nodes and edges, where the nodes, sometimes also referred to as vertices, represent entities, and the edges are links that connect the nodes representing relationships among them. From a theoretical perspective a graph is an ordered pair G = (V, E), where V is the set of vertices and E a set of edges, where each edge associates two vertices in V. This type of graphs is usually referred to as undirected and unweighted graphs. Those graphs where edges have orientation (i.e., given to vertices u, v in V there exist an arc between u and v but not the reverse one) are known as directed graphs. Finally, weighted graphs are those where a number (weight) is assigned to each edge. These weights might represent costs, lengths or capacities, depending on the problem at hand. Some authors call such a graph a network.

Graphs are the basic modeling unit in a wide variety of areas within CS and OR, like project management, production scheduling, line balancing, business plans, or software visualization. Graph theory offers a rich source of problems and techniques for optimization, programming and data structure development. The design of good heuristics or approximation algorithms for optimization problems often requires significant specialized knowledge and trial-and-error over a graph. In many real-world applications, it is typically the case that the same optimization problem is solved again and again on a regular basis, maintaining the same problem structure but differing in the data. This provides an opportunity for learning heuristic algorithms that exploit the structure of such recurring problems. We study the reinforcement learning and graph embedding to address this challenge. The learned greedy policy behaves like a meta-algorithm that incrementally constructs a solution, and the action is determined by the output of a graph embedding network capturing the current state of the solution.

The main objective of this project is to solve difficult optimization problems, NP-hard in scientific terms, see Garey and Johnson, (1990), on a graph modeling a real situation. A difficult optimization problem is one for which it is not possible to guarantee that the optimal solution will be found within a “reasonable” computational time. There are many difficult problems in practice which trigger the development of efficient procedures capable of finding “good” solutions even when these solutions could not be proven optimal. These methods, for which solution quality is as important as the computing time needed to find it, are known as heuristics. Nowadays, a family of modern and intelligent heuristics, usually known as metaheuristics (Glover, 1986), are master strategies that guide subordinate algorithms to produce solutions beyond those that are normally obtained in a quest for local optimality. Metaheuristics are among the most prominent and successful techniques to solve a large amount of complex and computationally hard combinatorial and numerical optimization problems. Both research groups have a long experience on the development and application of these methodologies. In this project, we propose the application of the metaheuristic methodologies to solve difficult graph-based problems arising in business and industry.

# CYNAMON – Cybersecurity, Network Analysis and Monitoring for the Next Generation Internet

Principal investigator: Abraham Duarte. Funding entities: Comunidad de Madrid, Unión Europea (Fondos Estructurales) (Ref. P2018/TCS-4566). Duration: 07/10/2019 – 31/08/2020.

Cyberspace plays nowadays a key role in modern societies and economies, and its protection is a pivotal challenge in national security strategies. Over the last decade, various technological developments have contributed to make our dependency on cyberspace even greater, including the generation and processing of massive amounts of data, the influence of social networks over all activities of our daily lives, or the trend to
connect to Internet virtually every real-­world device.

The scientific program proposed in this project aims at contributing to a more secure cyberspace in our current and future technological context. Our approach identifies three paradigms, each with a varying degree of maturity, that will reshape cybersecurity in the coming years. These are: the processing of massive amounts of information (big data), including those generated by citizens and devices; the embedding of computers in essentially all real­life objects (cyberphysical systems) and their connection to the Internet (IoT, Internet of Things); and the challenges and opportunities associated with the rise of quantum computing. To address these challenges we propose an interdisciplinary work program involving five research groups with proven expertise in the areas of system and application security, data analysis, next generation communication systems, and cryptography.

The project establishes three general objectives closely linked to the three challenges discussed above. First, we will develop advanced data analysis techniques to deal with massive amount of data, with a focus on two key application domains: social networks and new generation networks. Second, we will propose security mechanisms and tools for connected IoT devices and the network services they access. Finally, we will explore cryptographic techniques providing information security services and protecting users privacy. This will be done by leveraging new techniques such as the applications of the blockchain technology, as well as the threat posed by quantum computers.

Expected results include a significant advance in the research lines sketched above. In addition, harmonization between research and innovation tasks is ensured by the presence in the consortium of relevant industry partners (BBVA, Vicomtech y Unisys) with a keen interest in developing market­fitting technologies, together with various extraordinarily relevant institutional partners (INCIBE, Jefatura de Información de la Guardia Civil y Centro Criptológico Nacional). Thus, it is expected that this project’s results will be transferred and exploited as soon as they develop, contributing to an improved national cybersecurity at all levels. Consequently, our expected results have a very high technological relevance since they will provide tools for a more trustworthy cyberspace for citizens, companies, and Public Administrations.

The project goals are perfectly aligned with the Spanish and European priorities for the development of secure environments, which aim at strengthening citizen’s rights and improving the competitiveness of our industries and our defense capabilities.

# DIETHA II – Diseño, Implementación y Explotación de Técnicas Heurísticas Avanzadas

Principal investigator: Abraham Duarte. Funding entities: Ministerio de Economía y Competitividad (TIN2015-65460-C2-2-P). Duration: 01/01/2016 – 31/12/2018.

There are a large amount of problems that are framed within the context of combinatorial optimization characterized by the high interest associated with their practical resolution. This project deals with five distinct families of combinatorial problems. These are:

• Ordering problems: with applications in VLSI design or in the efficient resolution of systems of equations.
• Location problems: with interest in telecommunication applications such as distribution of signal regenerators or network design.
• Graph-based problems: with applications in the distribution of electronic devices in electronic boards or in image segmentation.
• Routing problems: by focusing on multi-objective problems with applications in the transport of hazardous materials or in recommendation systems.
• Selection problems: with applications in the construction of diverse groups or clustering of documents.

The methodology to solve the problems described above are metaheuristics procedures, among which we can highlight evolutionary algorithms, tabu search, variable neighborhood search or GRASP, to name some of the best known. For each combinatorial problem, we will propose the most suitable metaheuristic according to their mathematical structure or model. We will mainly focus on design of novel strategies to obtain high quality solutions. Besides, it is expected to figure out general strategies that can be easily applied to other related problems. We will also focus on the efficient and flexible implementation of those strategies, taking advantage of new language programming features and multi-core microprocessors. Finally we will focus on exploitation through a management platform, which integrates the problems addressed above. Simultaneously, we will develop an application to put into production (in the companies interested in our research project) the algorithms developed during the project.

In addition to solve the problems presented above, a second objective of the project is to develop the metaheuristic methodologies themselves. To successfully meet this challenge the research team has researcher Nenad Mladenovic, which developed in conjunction with Pierre Hansen the variable neighborhood search methodology.

All these problems will be integrated into Optsicom, a software tool that allow the execution of algorithms devoted to solve optimization problems and analyze the associated results. Optsicom can be used at two levels: as an end-user or as a researcher on heuristic methods. In this line, the problems integrated in Optsicom will also be available via the Web at Optsicom Repository, a web platform for comprehensive management of optimization problems. This platform will post all the information associated with an optimization problem. For each problem, it is expected to store the description, algorithms, instances, experimental results and relevant references. Furthermore, the results obtained by executing the algorithms can be compared using different statistical tests that are available as part of the software tool.

# PILOH – Desarrollo de una herramienta software para la resolución de problemas de ingeniería lingüística mediante optimización heurística

Principal investigator: Abraham DuarteFunding entities: URJC y Comunidad de Madrid (URJC-CM-2006-CET-0603)Duration: 01/01/2007 – 31/12/2007Abstract:

El tratamiento y organización de la enorme cantidad de información en formato electrónico de la que se dispone actualmente se han convertido en una necesidad dentro de esta Sociedad de la Información en la que vivimos. En consecuencia, no tendría sentido disponer de grandes repositorios de información de carácter lingüístico en los que no pudiéramos extraer un conocimiento útil. Este proyecto se enmarca dentro del área de la optimización heurística aplicada a problemas del Procesamiento de Lenguaje Natural e Ingeniería Lingüística. En este proyecto se desarrollará un software para la optimización de diferentes problemas de optimización con el objetivo de dar solución a dos de los grandes problemas a los que se enfrenta actualmente el campo del Procesamiento del Lenguaje Natural: la clasificación automática y el agrupamiento, o clustering, de documentos.

Los problemas que se pretenden abordar están basados en modelos estructurados; es decir, en los que se conoce una descripción o formulación matemática completa. Se propondrán diferentes modelos de resolución eficientes basados en procedimientos metaheurísticos. Los métodos que se propongan serán comparados con los mejores métodos de resolución existentes para ese tipo de problemas, tanto en el ámbito académico como en el comercial. Se pretende que esto dé lugar, tanto a una aplicación que proporcione soluciones de gran calidad, como a publicaciones científicas de impacto internacional.

# COSYO – COmplex SYstem Optimization

Principal investigator: Abraham DuarteFunding entities: URJC y Comunidad de Madrid (URJC-CM-2008-CET-3731)Duration: 01/01/2009 – 31/12/2009Abstract:

Existe un tipo de problemas de optimización especialmente difíciles de resolver en los que se dispone sólo de información parcial, denominados Sistemas Complejos. En ellos no se tiene una descripción explícita del problema ya que algunos de sus elementos característicos, como son la función objetivo o las restricciones, se obtienen de forma indirecta. Como consecuencia, éstos se tratan como una caja negra.

El proyecto de investigación se centrará en el diseño de un Solver genérico (Context-Independent Solver) para la optimización de sistemas complejos mediante técnicas metaheurísticas. El Solver desarrollado generará soluciones como entrada a la caja negra y posteriormente analizará el resultado devuelto, extrayendo información sobre las soluciones, de tal forma que iterativamente se vayan generando soluciones de mayor calidad

Para diseñar el Solver, en primer lugar se categorizarán los problemas en función de si están descritos mediante variables enteras, permutaciones de elementos o variables continuas. Posteriormente, se diseñará un método basado en metaheurísticas para resolver cada tipo de problema. El último paso del diseño del Solver genérico consistirá en la integración de los tres métodos en un único esquema general que seleccionará el más adecuado para la resolución de cada problema. El Solver se complementará con una implementación del mismo en una herramienta denominada COSYO.

COSYO será un Solver genérico para la optimización de sistemas complejos modelados como una caja negra. Se considerarán dos perfiles de usuario de la herramienta. Por un lado, investigadores o profesionales con conocimientos de optimización (usándolo como librería de programación) y, por otro lado, profesionales que no tengan conocimientos avanzados en optimización (usándolo desde la hoja de cálculo de OpenOffice.org)

Los métodos propuestos en el desarrollo del proyecto se compararán con los mejores métodos existentes para ese tipo de problemas tanto en el ámbito académico como en el comercial. Esto dará lugar tanto a una aplicación que proporcione soluciones de gran calidad como a publicaciones científicas de impacto internacional.

# MA2VICMR – Mejorando el Acceso, el Análisis y la Visibilidad de la Información y los Contenidos Multilingües y Multimedia en Red para la Comunidad de Madrid

Principal investigator: Abraham DuarteFunding entities: Comunidad de Madrid y Fondo Social Europeo (S2009/TIC-1542)Duration: 01/01/2010 – 31/12/2013Abstract:

Los sistemas de acceso a la información multimedia que trabajan sobre colecciones de imágenes suelen tener acceso a dos tipos de datos: los descriptores textuales y el contenido visual de las imágenes. Tradicionalmente, estos sistemas han abordado o bien el problema de la recuperación de imágenes analizando la información textual asociada (Text­Based Information Retrieval, TBIR) o bien analizando el contenido visual (Content­Based Information Retrieval, CBIR). Hasta hace unos pocos años, las aproximaciones mixtas no aportaban ninguna ventaja a los resultados, además de ser bastante ineficientes.

Por un lado, investigadores de NLP&IR­UNED y del grupo de Vision Team de la Universitat de Valencia coordinaron su experiencia previa en recuperación textual y la basada en contenido de imágenes. Fruto de los trabajos de esta colaboración, ha sido una aproximación que no solo se aprovecha de la sinergia entre los aspectos visuales y de las anotaciones textuales conjuntamente, sino que además aporta un método de cálculo eficiente para la búsqueda de imágenes anotadas en grandes colecciones, a partir de una consulta multimedia, ya sea texto y una o varias imágenes. Este trabajo ha generado, además de participaciones en competiciones como ImageCLEF y MediEval, varias publicaciones en actas de congresos, un artículo en la revista IEEE Transactions on Multimedia Journal y una tesis doctoral en el grupo NLP&IR­UNED titulada Fusión Multimedia Semántica Tardía aplicada a la Recuperación de Información Multimedia.

Por otro lado, otro equipo mixto formado por integrantes de NLP&IR­UNED y GAVAB­URJC han integrado tecnologías previas para construir un sistema híbrido de búsqueda de imágenes. La propuesta, que combinaba rasgos de contenido y análisis del texto enriquecido con recursos lingüísticos como WordNet, participó en dos ediciones de la competición Photo Annotation Task de ImageCLEF.

# DIETHA – Diseño, Implementación y Explotación de Técnicas Heurísticas Avanzadas

Principal investigator: Abraham DuarteFunding entities: Ministerio de Economía y Competitividad (TIN2012-35632-C02-02). Duration: 01/01/2013 – 31/12/2015,

There are a large amount of problems that are framed within the context of combinatorial optimization characterized by the high interest associated with their practical resolution. This project deals with five distinct families of combinatorial problems. These are:

• Ordering problems: with applications in VLSI design or in the efficient resolution of systems of equations.
• Location problems: with interest in telecommunication applications such as distribution of signal regenerators or network design.
• Graph-based problems: with applications in the distribution of electronic devices in electronic boards or in image segmentation.
• Routing problems: by focusing on multi-objective problems with applications in the transport of hazardous materials or in recommendation systems.
• Selection problems: with applications in the construction of diverse groups or clustering of documents.

The methodology to solve the problems described above are metaheuristics procedures, among which we can highlight evolutionary algorithms, tabu search, variable neighborhood search or GRASP, to name some of the best known. For each combinatorial problem, we will propose the most suitable metaheuristic according to their mathematical structure or model. We will mainly focus on design of novel strategies to obtain high quality solutions. Besides, it is expected to figure out general strategies that can be easily applied to other related problems. We will also focus on the efficient and flexible implementation of those strategies, taking advantage of new language programming features and multi-core microprocessors. Finally we will focus on exploitation through a management platform, which integrates the problems addressed above. Simultaneously, we will develop an application to put into production (in the companies interested in our research project) the algorithms developed during the project.

In addition to solve the problems presented above, a second objective of the project is to develop the metaheuristic methodologies themselves. To successfully meet this challenge the research team has researcher Nenad Mladenovic, which developed in conjunction with Pierre Hansen the variable neighborhood search methodology.

All these problems will be integrated into Optsicom, a software tool that allow the execution of algorithms devoted to solve optimization problems and analyze the associated results. Optsicom can be used at two levels: as an end-user or as a researcher on heuristic methods. In this line, the problems integrated in Optsicom will also be available via the Web at Optsicom Repository, a web platform for comprehensive management of optimization problems. This platform will post all the information associated with an optimization problem. For each problem, it is expected to store the description, algorithms, instances, experimental results and relevant references. Furthermore, the results obtained by executing the algorithms can be compared using different statistical tests that are available as part of the software tool.