Research Areas

  •   Metaheuristics
  •   Machine Learning
  •   HPC for Bioinformatics

Optimization and Metaheuristics - Metaheuristics combine basic heuristic methods in a higher-level framework aimed at efficiently and effectively exploring a search space that guides the search for a solution in a broad range of optimization problems. The main goal is to find an acceptable solution with an acceptable time. Most of the metaheuristics consist of interaction with local improvements (exploitation) and strategies that avoid being trapped in local optima (exploration).

In Bioinformatics there are several problems that still do not have a computational method that can guarantee a minimum quality of solution in a feasible time. This is due to the fact that, the rules that govern the biochemical processes and relations are partially known, making harder to design efficient computational strategies. Since such problems are classified as NP-Complete or NP-Hard, there is the need to use computational techniques that can deal with them.

Metaheuristics are one of the most common and powerful techniques used in this case. They do not guarantee the optimal solution, but they give a good approximation with a limited computational effort.

Keywords - memetic algorithms; distributed meta-heuristics; population-based metaheuristics; evolutionary computation

Machine Learning - Machine Learning is a research area of computer science that deals with the development of algorithms capable of learning from a usually voluminous and complex data set. These algorithms can infer input-output relationships without explicitly assuming a pre-determined model. The learning is mainly focused on the discovery of predictive models and the detection of patterns, relationships and dependencies in the data, automatically, allowing the extraction of implicit information that could hardly be detected through manual analysis.

Machine learning can efficiently deal with non-linear, noisy environments and treat properly missing data. For these reasons, machine learning is the most used algorithms to integrate and analyze omic data. There are three learning paradigms: supervised, unsupervised and semi-supervised. Supervised learning is a process in which the model is parameterized by using a set of observations, each of those associated with a known outcome (label). In opposition, in unsupervised learning, one does not have access to the labels, it can be viewed as the task of “spontaneously” finding patterns and structures in the input data. The third paradigm, semi-supervised, deals with a combination of supervised and unsupervised.

Keywords - feature selection; dimensionality reduction; interpretability in machine learning models; predictive models; neuroevolution; learning heuristics.

HPC for Bioinformatics - Depending on the problem, metaheuristics and machine learning approaches can be computationally expensive, allowing to solve only small instances of the problems. In order to overcome the above, it is possible to develop parallel models of metaheuristics and machine learning algorithms, which will allow to explore a larger number of plausible solutions. There are several ways to address the lack of hard computing power for bioinformatics: (1) developing new and faster heuristic algorithms (meta-heuristic) that reduce computational space for the most time-consuming tasks; (2) incorporating these algorithms into specialized chip and (3) the most promising consideration, parallel computing.

Parallel computing still requires new paradigms in order to harness the additional processing power for Bioinformatics. A recent trend in Structural Bioinformatics is to move the algorithms from traditional, single-core processors to multi-core processors and further to many-core or massively multi-core processors. Data-parallel computations, such as present in Bioinformatics Problems, with high arithmetic intensity, can attain maximum performance from Graphics Processing Units (GPU). In such cases, when the algorithm can be parallelized effectively there is a significant speedup.

Keywords - GPU computing; massive parallel computing; CUDA.

  •   Systems Biology
  •   Protein Structures
  •   Gene Expression Data
  •   Data Analysis

Systems Biology - Identification of targets of interest derived from large-scale biological data, creation of interaction networks between different types of molecules, prospecting for possible new drugs and elucidation of molecular mechanisms, evolutionary comparison between biochemical pathways - this and much more can be investigated through System Biology.

Our group has experience with applying various tools of Systems Biology in different organisms such as humans, mice, bacteria, plants, and fungi. Likewise, we have experience understanding the effects of molecules and toxic compounds in biological systems and prospecting for potential drugs. Systems biology has been increasingly employed in multiple types of work due to its flexibility and full application in all Molecular Biology areas.

Simulation and Modelling of Protein Structures - A protein of interest does not always have its structure available. Moreover, even when available, that structure does not explain how that molecule behaves when in a cellular environment or bound to other molecules, such as proteins or chemical compounds.

Our group has a journey in the application and creation of new tools to predict protein structures efficiently. Likewise, we have experience modeling and simulating proteins, protein complexes, and understanding the behavior of proteins bound to chemical compounds. The simulation of proteins bound to drugs or other compounds is directly linked to an effective reduction in the costs of choosing new drugs of interest. Similarly, they can be used to understand different molecules' behavior in solution and optimize biotechnology processes.

Analysis of Gene Expression Data - Most genes in an organism are expressed in RNA molecules, and these can be of different natures (e.g., mRNA, miRNA, lncRNA). Thus, understanding an organism's gene expression profiles is vital to explore possible molecular targets of biological and biotechnological interest. Similarly, these analyses are crucial to understanding how organisms or tissues respond to different conditions.

We are experienced in analyzing gene expression data, such as microarray and RNA-seq in multiple species. In this sense, we analyze data from any microarray platform and devise tools for the analysis of expression data. Concerning RNA-seq data, we are experienced in data analysis coming from the Illumina platform.

Machine Learning and Biological Data Analysis - The use of artificial intelligence, particularly machine learning, creates new opportunities for analyzing large volumes of biological data. These techniques allow identifying patterns that are often difficult to detect by other approaches and play a key role in understanding and solving complex problems in agriculture, livestock, extraction industry, health, and security.

Machine Learning techniques can be used to analyze genomic data, seeking to identify new biomarkers with diagnostic or prognostic value, or as potential therapeutic targets in the treatment of diseases. Similarly, they can be utilized to detect SNPs and SNVs of interest in a population. These same techniques can be used, for example, in agriculture to discover unknown metabolic pathways and defense mechanisms and their regulation for the study of plant-pathogen interaction. It can also be applied to the discovery of genetic variants with potential applications in animal genetic improvement.

Our laboratory has years of experience creating new algorithms for machine learning and using these techniques to different biological and biotechnological interest problems.

Tools and Datasets

Science is moving towards a greater openness, not just data but also publications, computer code, and workflows. The SBCBLab is committed to open science and free access to tools and datasets. Over the last few years, we have developed several tools, libraries, and datasets.

Publications

  • Journals
  • Proceedings
  • Book Chapter

Total of 39 publications


2021

2020

2019

2018

2017

2015

2014

2013

2012

2010

Total of 34 publications


2021

2020

2019

2018

2017

2016

2014

2013

2012

2011

2008

Total of 8 chapters


Laboratory Facilities

Scientific discoveries are closely linked to technological development. SBCB Lab maintains and constantly expands cutting-edge facilities that enable students and scientists to carry out their research.

The SBCB Lab receives cloud computing resources and support from the Microsoft Azure for Research initiative. In addition, for supercomputing facilities, SBCB Lab also has access to the National Center for Supercomputing (CESUP/UFRGS).