Optimization and Metaheuristics -
Metaheuristics combine basic heuristic methods in a higher-level framework aimed at
efficiently and effectively exploring a search space that guides the search for a
solution in a broad range of optimization problems. The main goal is to find an
acceptable solution with an acceptable time. Most of the metaheuristics consist of
interaction with local improvements (exploitation) and strategies that avoid being
trapped in local optima (exploration).
In Bioinformatics there are several problems that still do not have a computational
method that can guarantee a minimum quality of solution in a feasible time. This is
due to the fact that, the rules that govern the biochemical processes and relations
are partially known, making harder to design efficient computational strategies.
Since such problems are classified as NP-Complete or NP-Hard, there is the need to
use computational techniques that can deal with them.
Metaheuristics are one of the most common and powerful techniques used in this case.
They do not guarantee the optimal solution, but they give a good approximation with
a limited computational effort.
Keywords - memetic algorithms; distributed meta-heuristics;
population-based
metaheuristics; evolutionary computation
- Metaheuristics
- Machine Learning
- HPC for Bioinformatics
Machine Learning -
Machine Learning is a research area of computer science that deals with the
development of algorithms capable of learning from a usually voluminous and
complex data set. These algorithms can infer input-output relationships without
explicitly assuming a pre-determined model. The learning is mainly focused
on the discovery of predictive models and the detection of patterns,
relationships and dependencies in the data, automatically, allowing the
extraction of implicit information that could hardly be detected through
manual analysis.
Machine learning can efficiently deal with non-linear,
noisy environments and treat properly missing data. For these reasons,
machine learning is the most used algorithms to integrate and analyze omic
data. There are three learning paradigms: supervised, unsupervised and
semi-supervised. Supervised learning is a process in which the model is
parameterized by using a set of observations, each of those associated
with a known outcome (label). In opposition, in unsupervised learning,
one does not have access to the labels, it can be viewed as the task of
“spontaneously” finding patterns and structures in the input data.
The third paradigm, semi-supervised, deals with a combination of
supervised and unsupervised.
Keywords - feature selection; dimensionality reduction;
interpretability in machine learning models; predictive models;
neuroevolution; learning heuristics.
HPC for Bioinformatics -
Depending on the problem, metaheuristics and machine learning approaches
can be computationally expensive, allowing to solve only small instances
of the problems. In order to overcome the above, it is possible to
develop parallel models of metaheuristics and machine learning
algorithms, which will allow to explore a larger number of
plausible solutions. There are several ways to address the lack of
hard computing power for bioinformatics: (1) developing new and faster
heuristic algorithms (meta-heuristic) that reduce computational space
for the most time-consuming tasks; (2) incorporating these algorithms
into specialized chip and (3) the most promising consideration,
parallel computing.
Parallel computing still requires
new paradigms in order to harness the additional processing power
for Bioinformatics. A recent trend in Structural Bioinformatics is
to move the algorithms from traditional, single-core processors to
multi-core processors and further to many-core or massively multi-core
processors. Data-parallel computations, such as present in Bioinformatics
Problems, with high arithmetic intensity, can attain maximum performance
from Graphics Processing Units (GPU). In such cases, when the algorithm
can be parallelized effectively there is a significant speedup.
Keywords - GPU computing; massive parallel computing; CUDA.
- Systems Biology
- Protein Structures
- Gene Expression Data
- Data Analysis
Systems Biology -
Identification of targets of interest derived from large-scale
biological data, creation of interaction networks between different
types of molecules, prospecting for possible new drugs and
elucidation of molecular mechanisms, evolutionary comparison
between biochemical pathways -
this and much more can be investigated through System Biology.
Our group has experience with applying various tools of Systems
Biology in different organisms such as humans, mice, bacteria,
plants, and fungi. Likewise, we have experience understanding the
effects of molecules and toxic compounds in biological systems and
prospecting for potential drugs. Systems biology has been
increasingly employed in multiple types of work due to its
flexibility and full application in all Molecular Biology areas.
Keywords - Biological Networks; Biochemical Pathways; Molecular Mechanisms; Interactomes; Large-Scale Biological Data.
Simulation and Modelling of Protein Structures -
A protein of interest does not always have its structure available.
Moreover, even when available, that structure does not explain
how that molecule behaves when in a cellular environment or bound
to other molecules, such as proteins or chemical compounds.
Our group has a journey in the application and creation of new
tools to predict protein structures efficiently. Likewise, we
have experience modeling and simulating proteins, protein
complexes, and understanding the behavior of proteins bound to
chemical compounds. The simulation of proteins bound to drugs or
other compounds is directly linked to an effective reduction in
the costs of choosing new drugs of interest. Similarly, they can
be used to understand different molecules' behavior in solution
and optimize biotechnology processes.
Keywords - Protein Structure Prediction; Protein Complexes; Molecular Docking; Molecular Dynamics; Biotechnological Processes.
Analysis of Gene Expression Data -
Most genes in an organism are expressed in RNA molecules, and
these can be of different natures (e.g., mRNA, miRNA, lncRNA).
Thus, understanding an organism's gene expression profiles is
vital to explore possible molecular targets of biological and
biotechnological interest. Similarly, these analyses are crucial
to understanding how organisms or tissues respond to different
conditions.
We are experienced in analyzing gene expression data, such as
microarray and RNA-seq in multiple species. In this sense, we
analyze data from any microarray platform and devise tools for
the analysis of expression data. Concerning RNA-seq data, we are
experienced in data analysis coming from the Illumina platform.
Keywords - Microarrays, RNA-Seq; SNPs; SNVs; Isoforms; LncRNA.
Machine Learning and Biological Data Analysis -
The use of artificial intelligence, particularly machine learning,
creates new opportunities for analyzing large volumes of biological
data. These techniques allow identifying patterns that are often
difficult to detect by other approaches and play a key role in
understanding and solving complex problems in agriculture,
livestock, extraction industry, health, and security.
Machine Learning techniques can be used to analyze genomic data,
seeking to identify new biomarkers with diagnostic or prognostic
value, or as potential therapeutic targets in the treatment of
diseases. Similarly, they can be utilized to detect SNPs and SNVs
of interest in a population. These same techniques can be used,
for example, in agriculture to discover unknown metabolic
pathways and defense mechanisms and their regulation for the
study of plant-pathogen interaction. It can also be applied to
the discovery of genetic variants with potential applications in
animal genetic improvement.
Our laboratory has years of experience creating new algorithms
for machine learning and using these techniques to different
biological and biotechnological interest problems.
Keywords - Data Science; Big Data Analytics; Translational Data Science; Feature Selection; Feature Extraction; Predictive Models; Data Mining.