The Research Group “Knowledge Discovery and Intelligent Systems in Biomedicine” is a consolidated group of the IMIBIC. Our research team incorporates 10 researches and 8 PhD students. Our research areas cover two main fields: knowledge discovery and data mining, and the application of artificial intelligence techniques for the industrial development of intelligent systems. We have experience in basic research on the areas of big data, machine learning, soft computing and optimization techniques. At the same time, we have proven experience on the use of these techniques over diverse application domains and, lately, we have focused on applying them to Biomedicine field.
Our line of work for the next years focuses on the development of data analysis methodologies/proposals for solving complex problems in Biomedicine field with great social relevance, such as melanoma prediction, alternative splicing, prediction and description of pathologies related to arterial hypertension, among others. Data analysis techniques play a fundamental role in medical diagnosis, especially with the growth of precision medicine and individualized prognosis; thus, early diagnosis models suppose a great advantage for both the patient and the health-care system, whereas, the models we obtain can shed light on the understanding of diseases.
The principal investigator of the group Sebastian Ventura Soto and the rest of the members collaborate with the PAIDI TIC-222 Scientific group. Other collaborations include highly competitive research groups both on the national and international levels.
Líneas de Investigación
Predictive models are developed with the aim of estimating an output or a set of output values given a set of input features. Depending of the type of the outputs, these models are mainly categorized into classification (discrete outputs) and regression (continuous outputs).
Traditional classification and regression problems estimate an unique output value from a unique input vector, but, in the last years, more flexible representations of the input (multi-instance, multi-view) and output spaces (multi-label, multi-target) have been defined. Our studies in this field are based on the development of a wide range of predictive models for both classic problems and also problems with a more flexible representation of the input and output spaces.
Some of the studies of our research group are to directly apply these models to a wide range of real problems in biomedicine such as the prediction of risk of diabetes in patients and the diagnosis from clinical texts using multi-label classification.
Pattern mining aims at extracting and describing elements that are somehow related in a database. Patterns, as the key element in Data Analytics, represent any type of homogeneity and regularity in data and they serve as good descriptors of intrinsic and important properties of data. Our studies in this research field are focused on extracting knowledge (in the form of relationships) from the scratch and discovering useful information associated to specific variables of interest for the application field. Our research group has a broad background in different type of patterns, including frequent/infrequent patterns defined on discrete/continuous domains, and defined on different data types such as relational (as well as multirelational) data, sequential data and data defined on ambiguous domains. Last but not least, our research group has developed a wide range of algorithms for mining patterns with regard to a single (or multiple) target variable or variable of interest, including subgroup discovery approaches and algorithms for the discovery of exceptional models, among others.
In the context of the big data era, information systems produce a continuous flow of massive collections of data surpassing storage and computation capabilities of traditional knowledge extraction methods. Big data is characterized by its properties which include volume, velocity, variety, veracity, variability, visualization, and value.
In recent years, researchers have focused predominantly on the scalability of data mining algorithms to address the ever-increasing data volume. Distributed computing platforms such as Apache Hadoop and Spark implement the MapReduce programming model to scale out state of the art data mining algorithms to ever increasing heterogenous data volumes. This issue is especially challenging in the biomedical domain, comprising massive amounts of information from many data sources. Efficient and effective integration of all the available data to infer meaningful and accurate conclusions is not straightforward. Our research group has developed scalable algorithms for big data collections, adapted to the needs of the 21st-century information systems.
Workflow technology brings a representation framework to conduct data analysis closer to the application domain, hiding computational and execution requirements, and enabling the development of complex processes for knowledge extraction from heterogeneous data. Thus, workflows are a high-level mechanism to automate and describe processes as a set of activities that work together to produce a desired outcome. In data science, the application of workflows to data-intensive tasks faces significant challenges, not only referred to the decomposition of knowledge extraction methods into processes and activities, but also to the adaptation and arrangement of data-intensive, low-level algorithmic procedures. The development of workflow-based Big Data solutions requires the analysis of new parallelization solutions for data mining algorithms; their execution on distributed platforms, both clusters (e.g. Hadoop) and cloud-based systems (e.g. Azure); the reuse of processes and workflows among different application domains and problems like biomedicine or education; the optimisation of high-performance processes for the execution of data-intensive workflows at runtime; or data transformation of complex data structures (e.g. data streams). With the aim of democratizing data science over industrial environments, our research group is working on the construction of workflow-based solutions for improving the deployment and reusability of data mining algorithms and knowledge discovery techniques.
Networks
SEBASENet - Network of Excellence in Search Based Software Engineering
Red de Excelencia en Big Data y Análisis de Datos Escalable
Teoría y Aplicaciones de Minería de Datos
PAIDI TIC-122
Palabras Clave
- biomedical data science
- medical image
- machine learning
- big data
- descriptive models
- predictive models
- healthcare data analytics
- radiomics
- deep learning
- clustering
- pattern mining
- classification and regression
Información Adicional
Our website: Knowledge Discovery and Intelligent Systems