The use of bioinformatics tools in identifying molecular profiles of bacteria enables a precise
and efficient approach to disease diagnosis. Furthermore, it fosters a deeper understanding of
bacterial genetic diversity and facilitates well-informed clinical decision-making. In the field of
animal health, researchers focus on studying bacteria of the genus Brucella, which cause a
disease known as brucellosis. This disease, also called Malta fever or undulant fever, affects a
wide range of mammals, exhibiting zoonotic and cosmopolitan characteristics and posing a
significant risk to public health with substantial economic losses. Brucellosis can cause various
symptoms, ranging from cold-like signs to complications in the nervous system,
musculoskeletal system, and heart. In canines (affected by B. canis), nonspecific signs are
observed, like those in humans, but reproductive failures and joint issues related to this
bacterium are commonly diagnosed. Due to the diversity of clinical signs, diagnosing
brucellosis in humans and animals presents a significant challenge, with underdiagnosis
contributing to the spread of infection. Despite this, few genomic studies with different strains
of B. canis have been developed so far. In this regard, there is a demand for more information,
such as virulence factors, antimicrobial resistance genes, and the evolutionary profile of the
pathogen, which can greatly contribute to decision-making in government responses to public
health, as well as in storing and comparing data about this agent.
In the experimental front of this project, team members recently sequenced 20 B. canis
genomes using two sequencing technologies (for obtaining short reads and long reads), which
will contribute to the data used in solving this biological problem, along with 60 public
genomes of B. canis and 160 public genomes of B. suis. This data will be analyzed by the
computational tools developed in this proposal to identify species-specific genetic variations to
serve as diagnostic markers for brucellosis. Interpretable machine learning algorithms will be
employed to create a genotypic profile of virulent strains and differentiate them between
species based on their phenotypic differences and antimicrobial susceptibility profiles.
- Animal Health
- Personalized Medicine
- Phenotype Prediciton
- Protein Phenotype Insights
Personalized Medicine:
The use of Bioinformatics in Personalized Medicine, particularly in oncology, offers several
advantages. By integrating clinical, genetic, and genomic data, Bioinformatics allows a more
comprehensive understanding of individual patient characteristics, contributing to more
accurate diagnoses and informed prognoses. Gene expression analysis and biomarker
identification through bioinformatics techniques enable treatment personalization, adapting
therapeutic strategies based on the specific genetic features of each tumor. Additionally, the
application of machine learning algorithms in interpreting complex data drives the discovery of
patterns and non-obvious relationships, advancing the identification of potential drugs and
therapeutic targets. Bioinformatics plays a crucial role in transitioning from conventional to
personalized and predictive medicine, offering substantial benefits for early diagnosis,
effective treatment, and overall better management of cancer patients. The specific objectives
of this project focus on selecting tumor biomarkers and inhibitory drugs as potential
pharmaceuticals.
Therapeutic targets and biomarkers of tumor
This project centers on integrating machine learning and heuristic search methods to identify
tumor biomarkers, aiming to discover potential therapeutic targets. The goal is to develop
efficient machine learning techniques to handle the complexity of large-scale biological data,
contributing to the identification of biomarkers with diagnostic and prognostic value in
different types of cancer. The project seeks not only scientific advancements but also clinical
applicability, enhancing diagnostics, prognostics, and therapeutic planning.
Drug Design
The project proposes the use of machine learning and heuristic search methods in drug
selection to combat chemotherapy resistance in cancer treatments, focusing on resistence to
the chemotherapy drug cisplatin. The research aims to identify potential inhibitors of
polymerase Eta (POLH), associated with cisplatin resistance, through Virtual Screening
methods enhanced by machine learning techniques. The identified inhibitors will undergo in
silico testing through molecular dynamics (MD) simulations, and new machine learning
algorithms will be analyzed to reduce the need for bench experiments. The objective is to
accelerate identification, decrease drug development costs, and optimize the drug selection
process to address challenges in chemoresistant cancers.
Genotype-to-phenotype prediction is a crucial field in contemporary genetics, with important applications in forensic science and anthropological genetics. The search for markers capable of predicting externally visible characteristics (EVCs) has shown promise. Predictors of skin, eye, and hair color from DNA have been proposed with relative success, such as HIrisPlex-S. However, it demonstrated low predictive power for Latin Americans. In addition to the diverse nature of human populations, methodological challenges in the search for genetic markers (SNPs) and the development of predictors must be considered to improve performance. This project aims to construct predictors for skin, eye, and hair color. We have a sample of 6,987 individuals and 651,871 SNPs from five Latin American countries (Brazil, Chile, Colombia, Mexico, and Peru) obtained through CANDELA (Consortium for the Analysis of the Diversity and Evolution of Latin America). Thus, the project's objectives are to develop a global classifier for externally visible characteristics (eye, skin, and hair) for populations in five Latin American countries, as well as to generate specific classifiers for each population in the sampled countries.
ProteinPhenotypeInsights (ProPhIn) is a Python package designed to assist evolutionary
biologists in understanding the relationships between candidate proteins and categorical
phenotypes in related species. ProPhIn employs machine learning techniques to unravel
genotype-phenotype relationships, focusing on interpretability and the generation of
visualization tools.
The program encodes missense variations in a way that facilitates the understanding of
potential epistatic interactions and selects those more likely to impact the phenotype in each
species. Visualization tools for variations aid in interpreting statistical associations, making it
easier to assess the biological plausibility of findings. ProPhIn also evaluates the two-
dimensional distribution of species and conducts network analyses, aiding in the
understanding of overall data behavior.
We tested the software using the candidate genes OXT, OXTR, and LNPEP, and the phenotypes
social monogamy, paternal care, and litter size in primates. Our research group has been
studying these phenotypes and genes for about 10 years. When executing ProPhIn on our
database, the program identified 83.3% of the sites indicated by previous studies from our
team as potentially important due to their positions in molecules or signs of positive selection
and/or coevolution between OXT, OXTR, and LNPEP. Some of the identified variations have
already had their significance validated by functional studies. The program also discovered
new variations, potentially capable of explaining phenotypes in species less studied by our
research group.
This project aims to develop new bioinformatics tools based on Machine Learning methods (supervised and unsupervised), heuristic search methods, and high-performance computing to explore high-dimensional data in problems of scientific and economic interest in the area of human and animal health. We will develop: (i) algorithms based on adaptive and multiobjective metaheuristics; (ii) multimodal metaheuristics; (iii) time series-based metaheuristics; (iv) combinatorial optimization; (v) interpretable machine learning methods; (vi) algorithms for feature extraction and selection; and (vii) combination of interpretability methods aiming at building general-purpose strategies that contribute to the analysis of large data with complex structure...
Researchers
Dr. Márcio Dorn - Coordinator
Center for Biotechnology
Institute of Informatics - UFRGS - Brazil
Dr. Maria Cátira Bortolini
Institute of Biosciences
Department of Genetics - UFRGS - Brazil
Dr. Bruno Iochins Grisci
Center for Biotechnology
Institute of Informatics - UFRGS - Brazil
Dr. Manuel Escalona
Post Doc - INF/UFRGS - Brazil
Dr. Franciele Maboni Siqueira
Center for Biotechnology
Faculty of Veterinary - UFRGS - Brazil
Dr. Hugo Verli
Center for Biotechnology
Institute of Biosciences - UFRGS - Brazil
Dr. Juliana Silva Bernardes
LCQB/UPMC - France
Dr. Manuel Villalobos-Cid
DIINF/USACH - Chile
Dr. Mario Inostroza-Ponta
DIINF/USACH - Chile