Optimization and Metaheuristics -
Metaheuristics combine basic heuristic methods in a higher-level framework aimed at
efficiently and effectively exploring a search space that guides the search for a
solution in a broad range of optimization problems. The main goal is to find an
acceptable solution with an acceptable time. Most of the metaheuristics consist of
interaction with local improvements (exploitation) and strategies that avoid being
trapped in local optima (exploration).
In Bioinformatics there are several problems that still do not have a computational
method that can guarantee a minimum quality of solution in a feasible time. This is
due to the fact that, the rules that govern the biochemical processes and relations
are partially known, making harder to design efficient computational strategies.
Since such problems are classified as NP-Complete or NP-Hard, there is the need to
use computational techniques that can deal with them.
Metaheuristics are one of the most common and powerful techniques used in this case.
They do not guarantee the optimal solution, but they give a good approximation with
a limited computational effort.
Keywords - memetic algorithms; distributed meta-heuristics;
metaheuristics; evolutionary computation
Machine Learning -
Machine Learning is a research area of computer science that deals with the
development of algorithms capable of learning from a usually voluminous and
complex data set. These algorithms can infer input-output relationships without
explicitly assuming a pre-determined model. The learning is mainly focused
on the discovery of predictive models and the detection of patterns,
relationships and dependencies in the data, automatically, allowing the
extraction of implicit information that could hardly be detected through
Machine learning can efficiently deal with non-linear,
noisy environments and treat properly missing data. For these reasons,
machine learning is the most used algorithms to integrate and analyze omic
data. There are three learning paradigms: supervised, unsupervised and
semi-supervised. Supervised learning is a process in which the model is
parameterized by using a set of observations, each of those associated
with a known outcome (label). In opposition, in unsupervised learning,
one does not have access to the labels, it can be viewed as the task of
“spontaneously” finding patterns and structures in the input data.
The third paradigm, semi-supervised, deals with a combination of
supervised and unsupervised.
Keywords - feature selection; dimensionality reduction;
interpretability in machine learning models; predictive models;
neuroevolution; learning heuristics.
HPC for Bioinformatics -
Depending on the problem, metaheuristics and machine learning approaches
can be computationally expensive, allowing to solve only small instances
of the problems. In order to overcome the above, it is possible to
develop parallel models of metaheuristics and machine learning
algorithms, which will allow to explore a larger number of
plausible solutions. There are several ways to address the lack of
hard computing power for bioinformatics: (1) developing new and faster
heuristic algorithms (meta-heuristic) that reduce computational space
for the most time-consuming tasks; (2) incorporating these algorithms
into specialized chip and (3) the most promising consideration,
Parallel computing still requires
new paradigms in order to harness the additional processing power
for Bioinformatics. A recent trend in Structural Bioinformatics is
to move the algorithms from traditional, single-core processors to
multi-core processors and further to many-core or massively multi-core
processors. Data-parallel computations, such as present in Bioinformatics
Problems, with high arithmetic intensity, can attain maximum performance
from Graphics Processing Units (GPU). In such cases, when the algorithm
can be parallelized effectively there is a significant speedup.
Keywords - GPU computing; massive parallel computing; CUDA.