Research groups

GC25 Knowledge Discovery and Intelligent Systems in Biomedicine

The Research Group “Knowledge Discovery and Intelligent Systems in Biomedicine” is a consolidated group of the IMIBIC. Our research team incorporates 10 researches and 5 PhD students. Our research areas cover two main fields: knowledge discovery and data mining, and the application of artificial intelligence techniques for the industrial development of intelligent systems. We have experience in basic research on the areas of big data, machine learning, soft computing and optimization techniques. At the same time, we have proven experience on the use of these techniques over diverse application domains and, lately, we have focused on applying them to Biomedicine field.

Our line of work for the next years focuses on the development of data analysis methodologies/proposals for solving complex problems in Biomedicine field with great social relevance, such as melanoma prediction, alternative splicing, prediction and description of pathologies related to arterial hypertension, among others. Data analysis techniques play a fundamental role in medical diagnosis, especially with the growth of precision medicine and individualized prognosis; thus, early diagnosis models suppose a great advantage for both the patient and the health-care system, whereas, the models we obtain can shed light on the understanding of diseases.

The principal investigator of the group Sebastian Ventura Soto and the rest of the members collaborate with the PAIDI TIC-222 Scientific group. Other collaborations include highly competitive research groups both on the national and international levels.

Research Lines

Predictive models are developed with the aim of estimating an output or a set of output values given a set of input features. Depending of the type of the outputs, these models are mainly categorized into classification (discrete outputs) and regression (continuous outputs).

Traditional classification and regression problems estimate an unique output value from a unique input vector, but, in the last years, more flexible representations of the input (multi-instance, multi-view) and output spaces (multi-label, multi-target) have been defined. Our studies in this field are based on the development of a wide range of predictive models for both classic problems and also problems with a more flexible representation of the input and output spaces.

Some of the studies of our research group are to directly apply these models to a wide range of real problems in biomedicine such as the prediction of risk of diabetes in patients and the diagnosis from clinical texts using multi-label classification.

Pattern mining aims at extracting and describing elements that are somehow related in a database. Patterns, as the key element in Data Analytics, represent any type of homogeneity and regularity in data and they serve as good descriptors of intrinsic and important properties of data. Our studies in this research field are focused on extracting knowledge (in the form of relationships) from the scratch and discovering useful information associated to specific variables of interest for the application field. Our research group has a broad background in different type of patterns, including frequent/infrequent patterns defined on discrete/continuous domains, and defined on different data types such as relational (as well as multirelational) data, sequential data and data defined on ambiguous domains. Last but not least, our research group has developed a wide range of algorithms for mining patterns with regard to a single (or multiple) target variable or variable of interest, including subgroup discovery approaches and algorithms for the discovery of exceptional models, among others.

In the context of the big data era, information systems produce a continuous flow of massive collections of data surpassing storage and computation capabilities of traditional knowledge extraction methods. Big data is characterized by its properties which include volume, velocity, variety, veracity, variability, visualization, and value.

In recent years, researchers have focused predominantly on the scalability of data mining algorithms to address the ever-increasing data volume. Distributed computing platforms such as Apache Hadoop and Spark implement the MapReduce programming model to scale out state of the art data mining algorithms to ever increasing heterogenous data volumes. This issue is especially challenging in the biomedical domain, comprising massive amounts of information from many data sources. Efficient and effective integration of all the available data to infer meaningful and accurate conclusions is not straightforward. Our research group has developed scalable algorithms for big data collections, adapted to the needs of the 21st-century information systems.

Workflow technology brings a representation framework to conduct data analysis closer to the application domain, hiding computational and execution requirements, and enabling the development of complex processes for knowledge extraction from heterogeneous data. Thus, workflows are a high-level mechanism to automate and describe processes as a set of activities that work together to produce a desired outcome. In data science, the application of workflows to data-intensive tasks faces significant challenges, not only referred to the decomposition of knowledge extraction methods into processes and activities, but also to the adaptation and arrangement of data-intensive, low-level algorithmic procedures. The development of workflow-based Big Data solutions requires the analysis of new parallelization solutions for data mining algorithms; their execution on distributed platforms, both clusters (e.g. Hadoop) and cloud-based systems (e.g. Azure); the reuse of processes and workflows among different application domains and problems like biomedicine or education; the optimisation of high-performance processes for the execution of data-intensive workflows at runtime; or data transformation of complex data structures (e.g. data streams). With the aim of democratizing data science over industrial environments, our research group is working on the construction of workflow-based solutions for improving the deployment and reusability of data mining algorithms and knowledge discovery techniques.



  • Biomedical data analysis
  • big data
  • data science
  • data mining
  • machine learning
  • predictive models
  • classification
  • regression
  • descriptive models
  • association
  • clusters

Additional Information

GC25 Knowledge Discovery and Intelligent Systems in Biomedicine

Principal Investigator
Sebastián Ventura Soto

Cristóbal Romero
Carlos García-Martínez
Amelia Zafra
José Raúl Romero
Eva Lucrecia Gibaja
María Luque Rodríguez
Alberto Cano Rojas

Post-Doctoral Researchers
José María Luna Ariza

Pre-Doctoral Researchers
Carmen Luque Guzmán
José Antonio Delgado Osuna
José María Moyano Murillo (FPU grant)
Eduardo Pérez Perdomo (iPFIS grant)
Aurora Ramírez (FPU grant)


Active in 2019


Ventura-Soto, S. EMERging trends in Data analysis. Funding agency: Spanish Ministry of Economy and Competitiveness (MINECO). Reference: TIN2017-83445-P.


Agüera E., Ventura-Soto S. dreaMS - A telehealth tool for monitoring and treatment of Multiple Sclerosis patients. Funding agency: European Commission. Reference: DIATOMIC.

Finished Projects

Ventura-Soto, S. Data Mining with More Flexible Representations. Funding agency: Spanish Ministry of Economy and Competitiveness (MINECO). Reference: TIN2014-55252-P.


Publications 2019

Ramirez A, Romero JR, Garcia-Martinez C, Ventura S; JCLEC-MO: A Java suite for solving many-objective optimization engineering problems. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE. 2019. 81. 14-28. DOI: 10.1016/j.engappai.2019.02.003. 
IF: 3,526
Q: 1

Ramirez A, Romero JR, Ventura S; A survey of many-objective optimisation in search-based software engineering. JOURNAL OF SYSTEMS AND SOFTWARE. 2019. 149. 382-395. DOI: 10.1016/j.jss.2018.12.015. 
IF: 2,559
Q: 1

Luque C, Luna JM, Luque M, Ventura S; An advanced review on text mining in medicine. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY. 2019. 9. 3. DOI: 10.1002/widm.1302.
IF: 2,541
Q: 1

Luna JM, Fournier-Viger P, Ventura S; Frequent itemset mining: A 25 years review. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY. 2019. 0. DOI: 10.1002/widm.1329.
IF: 2,541
Q: 1

Romero C, Ventura S; Guest Editorial: Special Issue on Early Prediction and Supporting of Learning Performance. IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES. 2019. 12. 2. 145-147. DOI: 10.1109/TLT.2019.2908106. 
IF: 2,315
Q: 1

Luna JM, Ondra M, Fardoun HM, Ventura S; Optimization of quality measures in association rule mining: an empirical study. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS. 2019. 12. 1. 59-78. DOI: 10.2991/ijcis.2018.25905182. 
IF: 2,153
Q: 3

Abouzid H, Chakkor O, Reyes OG, Ventura S; Signal speech reconstruction and noise removal using convolutional denoising audioencoders with neural deep learning. ANALOG INTEGRATED CIRCUITS AND SIGNAL PROCESSING. 2019. 100. 3. 501-512. DOI: 10.1007/s10470-019-01446-6. 
IF: 0,823
Q: 4

Maqsood R, Ceravolo P, Ventura S; Discovering Students' Engagement Behaviors in Confidence-based Assessment. PROCEEDINGS OF 2019 IEEE GLOBAL ENGINEERING EDUCATION CONFERENCE (EDUCON). 2019. 0. 841-846. DOI: 10.13140/RG.2.2.30967.68007.
Quintero-Dominguez LA, Morell C, Ventura S; WordificationMI: multi-relational data mining through multiple-instance propositionalization. PROGRESS IN ARTIFICIAL INTELLIGENCE. 2019. 8. 3. 375-387. DOI: 10.1007/s13748-019-00186-y. 

Main Publications 2017

Cano A, Garcia-Martinez C, Ventura S. Extremely high-dimensional optimization with MapReduce: Scaling functions and algorithm. INFORMATION SCIENCES. 2017.415():110-127.
IF: 4,832
Q: 1  D: 1

Melki G, Cano A, Kecman V, Ventura S. Multi-target support vector regression via correlation regressor chains. INFORMATION SCIENCES. 2017.415():53-69.
IF: 4,832
Q: 1  D: 1

Moyano JM, Gibaja EL, Ventura S. MLDA: A tool for analyzing multi-label datasets. KNOWLEDGE-BASED SYSTEMS. 2017.121():1-3.
Q: 1 

Cano A, Ventura S, Cios KJ. Multi-objective genetic programming for feature extraction and data visualization. SOFT COMPUTING. 2017.21(8):2069-2089.
IF: 2,472
Q: 2 

Romero C, Ventura S. Educational data science in massive open online courses. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY. 2017.7(1):-UNSP e1187.
IF: 2,111
Q: 2 

Altalhi AH, Luna JM, Vallejo MA, Ventura S. Evaluation and comparison of open source software suites for data mining and knowledge discovery. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY. 2017.7(3):-UNSP e1204.
IF: 2,111
Q: 2 

Juan AA, Loch B, Daradoumis T, Ventura S. Games and simulation in higher education. INTERNATIONAL JOURNAL OF EDUCATIONAL TECHNOLOGY IN HIGHER EDUCATION. 2017.14():-.

Ramirez A, Barbudo R, Romero JR, Ventura S. Memetic Algorithms for the Automatic Discovery of Software Architectures. INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA 2016). 2017.557():437-447.

Padillo F, Luna JM, Ventura S. Mining Perfectly Rare Itemsets on Big Data: An Approach Based on Apriori-Inverse and MapReduce. INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA 2016). 2017.557():508-518.



Cristóbal Romero, Carlos García-Martínez, Amelia Zafra, José Raúl Romero, Eva Lucrecia Gibaja, María Luque Rodríguez, Alberto Cano Rojas, José María Luna Ariza, Aurora Ramírez, Carmen Luque Guzmán, José María Moyano Murillo, Eduardo Pérez Perdomo, José Antonio Delgado Osuna