Research groups

GC25 Knowledge Discovery and Intelligent Systems in Biomedicine

The Research Group “Knowledge Discovery and Intelligent Systems in Biomedicine” is a consolidated group of the IMIBIC. Our research team incorporates 10 researches and 8 PhD students. Our research areas cover two main fields: knowledge discovery and data mining, and the application of artificial intelligence techniques for the industrial development of intelligent systems. We have experience in basic research on the areas of big data, machine learning, soft computing and optimization techniques. At the same time, we have proven experience on the use of these techniques over diverse application domains and, lately, we have focused on applying them to Biomedicine field.

Our line of work for the next years focuses on the development of data analysis methodologies/proposals for solving complex problems in Biomedicine field with great social relevance, such as melanoma prediction, alternative splicing, prediction and description of pathologies related to arterial hypertension, among others. Data analysis techniques play a fundamental role in medical diagnosis, especially with the growth of precision medicine and individualized prognosis; thus, early diagnosis models suppose a great advantage for both the patient and the health-care system, whereas, the models we obtain can shed light on the understanding of diseases.

The principal investigator of the group Sebastian Ventura Soto and the rest of the members collaborate with the PAIDI TIC-222 Scientific group. Other collaborations include highly competitive research groups both on the national and international levels.

Research Lines

Predictive models are developed with the aim of estimating an output or a set of output values given a set of input features. Depending of the type of the outputs, these models are mainly categorized into classification (discrete outputs) and regression (continuous outputs).

Traditional classification and regression problems estimate an unique output value from a unique input vector, but, in the last years, more flexible representations of the input (multi-instance, multi-view) and output spaces (multi-label, multi-target) have been defined. Our studies in this field are based on the development of a wide range of predictive models for both classic problems and also problems with a more flexible representation of the input and output spaces.

Some of the studies of our research group are to directly apply these models to a wide range of real problems in biomedicine such as the prediction of risk of diabetes in patients and the diagnosis from clinical texts using multi-label classification.

Pattern mining aims at extracting and describing elements that are somehow related in a database. Patterns, as the key element in Data Analytics, represent any type of homogeneity and regularity in data and they serve as good descriptors of intrinsic and important properties of data. Our studies in this research field are focused on extracting knowledge (in the form of relationships) from the scratch and discovering useful information associated to specific variables of interest for the application field. Our research group has a broad background in different type of patterns, including frequent/infrequent patterns defined on discrete/continuous domains, and defined on different data types such as relational (as well as multirelational) data, sequential data and data defined on ambiguous domains. Last but not least, our research group has developed a wide range of algorithms for mining patterns with regard to a single (or multiple) target variable or variable of interest, including subgroup discovery approaches and algorithms for the discovery of exceptional models, among others.

In the context of the big data era, information systems produce a continuous flow of massive collections of data surpassing storage and computation capabilities of traditional knowledge extraction methods. Big data is characterized by its properties which include volume, velocity, variety, veracity, variability, visualization, and value.

In recent years, researchers have focused predominantly on the scalability of data mining algorithms to address the ever-increasing data volume. Distributed computing platforms such as Apache Hadoop and Spark implement the MapReduce programming model to scale out state of the art data mining algorithms to ever increasing heterogenous data volumes. This issue is especially challenging in the biomedical domain, comprising massive amounts of information from many data sources. Efficient and effective integration of all the available data to infer meaningful and accurate conclusions is not straightforward. Our research group has developed scalable algorithms for big data collections, adapted to the needs of the 21st-century information systems.

Workflow technology brings a representation framework to conduct data analysis closer to the application domain, hiding computational and execution requirements, and enabling the development of complex processes for knowledge extraction from heterogeneous data. Thus, workflows are a high-level mechanism to automate and describe processes as a set of activities that work together to produce a desired outcome. In data science, the application of workflows to data-intensive tasks faces significant challenges, not only referred to the decomposition of knowledge extraction methods into processes and activities, but also to the adaptation and arrangement of data-intensive, low-level algorithmic procedures. The development of workflow-based Big Data solutions requires the analysis of new parallelization solutions for data mining algorithms; their execution on distributed platforms, both clusters (e.g. Hadoop) and cloud-based systems (e.g. Azure); the reuse of processes and workflows among different application domains and problems like biomedicine or education; the optimisation of high-performance processes for the execution of data-intensive workflows at runtime; or data transformation of complex data structures (e.g. data streams). With the aim of democratizing data science over industrial environments, our research group is working on the construction of workflow-based solutions for improving the deployment and reusability of data mining algorithms and knowledge discovery techniques.


    SEBASENet - Network of Excellence in Search Based Software Engineering 

    Red de Excelencia en Big Data y Análisis de Datos Escalable

    Teoría y Aplicaciones de Minería de Datos

    PAIDI TIC-122


  • biomedical data science
  • medical image
  • machine learning
  • big data
  • descriptive models
  • predictive models
  • healthcare data analytics
  • radiomics
  • deep learning
  • clustering
  • pattern mining
  • classification and regression

Additional Information



Ongoing projects


Sebastián Ventura Soto; EMERging trends in Data analysis; Funding Agency: MINISTERIO DE ASUNTOS ECONÓMICOS Y TRANSFORMACIÓN DIGITAL; Reference: TIN2017-83445-P


Sebastián Verntura Soto; dreaMS - A telehealth tool for monitoring and treatment of Multiple Sclerosis patients; Funding Agency: COMISIÓN EUROPEA; Reference: DIATOMIC-2019-02-002


Ongoing projects


Ventura-Soto, S. EMERging trends in Data analysis. Funding agency: Spanish Ministry of Economy and Competitiveness (MINECO). Reference: TIN2017-83445-P.


Agüera E., Ventura-Soto S. dreaMS - A telehealth tool for monitoring and treatment of Multiple Sclerosis patients. Funding agency: European Commission. Reference: DIATOMIC.

Finished Projects

Ventura-Soto, S. Data Mining with More Flexible Representations. Funding agency: Spanish Ministry of Economy and Competitiveness (MINECO). Reference: TIN2014-55252-P.


Publications in 2020

Reyes O, Perez E, Luque RM, Castano J, Ventura S. A supervised machine learning-based methodology for analyzing dysregulation in splicing machinery: An application in cancer diagnosis. ARTIFICIAL INTELLIGENCE IN MEDICINE. 2020. 108 ():- DOI: 10.1016/j.artmed.2020.101950 
IF: 4,383 Q: 1

Moyano JM, Gibaja EL, Cios KJ, Ventura S. Combining multi-label classifiers based on projections of the output space using Evolutionary algorithms. KNOWLEDGE-BASED SYSTEMS. 2020. 196 ():- DOI: 10.1016/j.knosys.2020.105770 
IF: 5,921 Q: 1

Gonzalez-Lopez J, Ventura S, Cano A. Distributed multi-label feature selection using individual mutual information measures. KNOWLEDGE-BASED SYSTEMS. 2020. 188 ():- DOI: 10.1016/j.knosys.2019.105052 
IF: 5,921 Q: 1

Gonzalez-Lopez J, Ventura S, Cano A. Distributed Selection of Continuous Features in Multilabel Classification Using Mutual Information. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS. 2020. 31 (7):2280-2293 DOI: 10.1109/TNNLS.2019.2944298 
IF: 8,793 Q: 1 D: 1

Jimenez-Vacas JM, Herrero-Aguayo V, Montero-Hidalgo AJ, Gomez-Gomez E, Fuentes-Fayos AC, Leon-Gonzalez AJ, Saez-Martinez P, Alors-Perez E, Pedraza-Arevalo S, Gonzalez-Serrano T, Reyes O, Martinez-Lopez A, Sanchez-Sanchez R, Ventura S, Yubero-Serrano EM, R. Dysregulation of the splicing machinery is directly associated to aggressiveness of prostate cancer. EBIOMEDICINE. 2020. 51 ():- DOI: 10.1016/j.ebiom.2019.11.008 
IF: 5,736 Q: 1

Romero C, Ventura S. Educational data mining and learning analytics: An updated survey. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY. 2020. 10 (3):- DOI: 10.1002/widm.1355 
IF: 4,476 Q: 1

Luna JM, Pechenizkiy M, Duivesteijn W, Ventura S. Exceptional in so Many Ways-Discovering Descriptors That Display Exceptional Behavior on Contrasting Scenarios. IEEE ACCESS. 2020. 8 ():200982-200994 DOI: 10.1109/ACCESS.2020.3034885 
IF: 3,745 Q: 1

Luna JM, Fournier-Viger P, Ventura S. Extracting User-Centric Knowledge on Two Different Spaces: Concepts and Records. IEEE ACCESS. 2020. 8 ():134782-134799 DOI: 10.1109/ACCESS.2020.3010852 
IF: 3,745 Q: 1

Delgado-Osuna JA, Garcia-Martinez C, Gomez-Barbadillo J, Ventura S. Heuristics for interesting class association rule mining a colorectal cancer database. INFORMATION PROCESSING & MANAGEMENT. 2020. 57 (3):- DOI: 10.1016/j.ipm.2020.102207 
IF: 4,787 Q: 1 D: 1

Padillo F, Luna JM, Ventura S. LAC: Library for associative classification. KNOWLEDGE-BASED SYSTEMS. 2020. 193 ():- DOI: 10.1016/j.knosys.2019.105432 
IF: 5,921 Q: 1

Garcia-Martinez C, Ventura S. Multi-view Genetic Programming Learning to Obtain Interpretable Rule-Based Classifiers for Semi-supervised Contexts. Lessons Learnt. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS. 2020. 13 (1):576-590 DOI: 10.2991/ijcis.d.200511.002 
IF: 1,838 Q: 3

Saeed-Ul H, Aljohani NR, Idrees N, Sarwar R, Nawaz R, Martinez-Camara E, Ventura S, Herrera F. Predicting literature's early impact with sentiment analysis in Twitter. KNOWLEDGE-BASED SYSTEMS. 2020. 192 ():- DOI: 10.1016/j.knosys.2019.105383 
IF: 5,921 Q: 1

Fuentes-Fayos AC, Vazquez-Borrego MC, Jimenez-Vacas JM, Bejarano L, Pedraza-Arevalo S, L-Lopez F, Blanco-Acevedo C, Sanchez-Sanchez R, Reyes O, Ventura S, Solivera J, Breunig JJ, Blasco MA, Gahete MD, Castano JP, Luque RM. Splicing machinery dysregulation drives glioblastoma development/aggressiveness: oncogenic role of SRSF3. BRAIN. 2020. 143 ():3273-3293 DOI: 10.1093/brain/awaa273 
IF: 11,337 Q: 1 D: 1

Hassan SU, Aljohani NR, Shabbir M, Ali U, Iqbal S, Sarwar R, Martinez-Camara E, Ventura S, Herrera F. Tweet Coupling: a social media methodology for clustering scientific publications. SCIENTOMETRICS. 2020. 124 (2):973-991 DOI: 10.1007/s11192-020-03499-1 
IF: 2,867 Q: 1

Lopez-Zambrano, J; Lara, JA; Romero, C. Towards portability of models for predicting students’ final performance in university courses starting from Moodle Logs . APPLIED SCIENCES-BASEL. 2020. 10 (1):- DOI: 10.3390/app10010354 
IF: 2,474 Q: 2

Cerezo, R; Bogarin, A; Esteban, M; Romero, C. Process mining for self-regulated learning assessment in e-learning. Journal of Computing in Higher Education . JOURNAL OF COMPUTING IN HIGHER EDUCATION. 2020. 32 (1):- DOI: 10.1007/s12528-019-09225-y 
IF: 2,271 Q: 1

Publications in 2019

Ramirez A, Romero JR, Garcia-Martinez C, Ventura S; JCLEC-MO: A Java suite for solving many-objective optimization engineering problems. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE. 2019. 81. 14-28. DOI: 10.1016/j.engappai.2019.02.003. 
IF: 3,526 Q: 1

Ramirez A, Romero JR, Ventura S; A survey of many-objective optimisation in search-based software engineering. JOURNAL OF SYSTEMS AND SOFTWARE. 2019. 149. 382-395. DOI: 10.1016/j.jss.2018.12.015. 
IF: 2,559 Q: 1

Luque C, Luna JM, Luque M, Ventura S; An advanced review on text mining in medicine. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY. 2019. 9. 3. DOI: 10.1002/widm.1302.
IF: 2,541 Q: 1

Luna JM, Fournier-Viger P, Ventura S; Frequent itemset mining: A 25 years review. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY. 2019. 0. DOI: 10.1002/widm.1329.
IF: 2,541 Q: 1

Romero C, Ventura S; Guest Editorial: Special Issue on Early Prediction and Supporting of Learning Performance. IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES. 2019. 12. 2. 145-147. DOI: 10.1109/TLT.2019.2908106. 
IF: 2,315 Q: 1

Luna JM, Ondra M, Fardoun HM, Ventura S; Optimization of quality measures in association rule mining: an empirical study. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS. 2019. 12. 1. 59-78. DOI: 10.2991/ijcis.2018.25905182. 
IF: 2,153 Q: 3

Abouzid H, Chakkor O, Reyes OG, Ventura S; Signal speech reconstruction and noise removal using convolutional denoising audioencoders with neural deep learning. ANALOG INTEGRATED CIRCUITS AND SIGNAL PROCESSING. 2019. 100. 3. 501-512. DOI: 10.1007/s10470-019-01446-6. 
IF: 0,823 Q: 4

Maqsood R, Ceravolo P, Ventura S; Discovering Students' Engagement Behaviors in Confidence-based Assessment. PROCEEDINGS OF 2019 IEEE GLOBAL ENGINEERING EDUCATION CONFERENCE (EDUCON). 2019. 0. 841-846. DOI: 10.13140/RG.2.2.30967.68007.
Quintero-Dominguez LA, Morell C, Ventura S; WordificationMI: multi-relational data mining through multiple-instance propositionalization. PROGRESS IN ARTIFICIAL INTELLIGENCE. 2019. 8. 3. 375-387. DOI: 10.1007/s13748-019-00186-y. 

Publications in 2017

Cano A, Garcia-Martinez C, Ventura S. Extremely high-dimensional optimization with MapReduce: Scaling functions and algorithm. INFORMATION SCIENCES. 2017.415():110-127.
IF: 4,832 Q: 1 D: 1

Melki G, Cano A, Kecman V, Ventura S. Multi-target support vector regression via correlation regressor chains. INFORMATION SCIENCES. 2017.415():53-69.
IF: 4,832 Q: 1 D: 1

Moyano JM, Gibaja EL, Ventura S. MLDA: A tool for analyzing multi-label datasets. KNOWLEDGE-BASED SYSTEMS. 2017.121():1-3.
IF:4,529 Q: 1 

Cano A, Ventura S, Cios KJ. Multi-objective genetic programming for feature extraction and data visualization. SOFT COMPUTING. 2017.21(8):2069-2089.
IF: 2,472 Q: 2 

Romero C, Ventura S. Educational data science in massive open online courses. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY. 2017.7(1):-UNSP e1187.
IF: 2,111 Q: 2 

Altalhi AH, Luna JM, Vallejo MA, Ventura S. Evaluation and comparison of open source software suites for data mining and knowledge discovery. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY. 2017.7(3):-UNSP e1204.
IF: 2,111 Q: 2 

Juan AA, Loch B, Daradoumis T, Ventura S. Games and simulation in higher education. INTERNATIONAL JOURNAL OF EDUCATIONAL TECHNOLOGY IN HIGHER EDUCATION. 2017.14():-.

Ramirez A, Barbudo R, Romero JR, Ventura S. Memetic Algorithms for the Automatic Discovery of Software Architectures. INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA 2016). 2017.557():437-447.

Padillo F, Luna JM, Ventura S. Mining Perfectly Rare Itemsets on Big Data: An Approach Based on Apriori-Inverse and MapReduce. INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA 2016). 2017.557():508-518.


Principal Investigator
Sebastián Ventura Soto

Carlos García Martínez
Eva Lucrecia Gibaja Galindo
José María Luna Ariza
María Luque Rodríguez
Cristóbal Romero Morales
José Raúl Romero Salguero
Amelia Zafra Gómez

Post-Doctoral Researchers
José María Moyano Murillo
Aurora Ramírez Quesada

Pre-Doctoral Researchers
Rafael Barbudo Lunar
José Antonio Delgado Osuna
Aurora Esteban Toscano
María del Carmen Luque Guzmán
Juan Antonio Marín Sanz
Antonio Rafael Moya Martín-Castaño
Eduardo Pérez Perdomo
Antonio Manuel Trasierras Fresco