»What?    » Why?    » What else?    »How?    » Installation    » Configuration    » Input
   » Tutorials    » Errors    » Contact    » Publications    » How to cite?    » Gallery

Conformational generation is a recurrent challenge in early phases of drug design, mostly due to the task of making sense between the number of conformers generated and their relevance for biological purposes.

In this sense, ConfID, a python-based computational tool, was designed to identify and characterize conformational populations of drug-like molecules sampled through molecular dynamics simulations.

By using molecular dynamics (MD) simulations (and assuming accurate parameters are used), ConfID can identify all conformational populations sampled in the presence of solvent and quantify their relative abundance, while harnessing the benefits of MD and calculating time-dependent properties of each conformational population identified.

To download ConfID, access: https://github.com/sbcblab/confid
To read a complete ConfID tutorial, access: http://sbcb.inf.ufrgs.br/confidtutorial
To contact us, please drop an email to confidcontact@gmail.com

What is ConfID?


It is a python-based computational tool designed to identify and characterize conformational populations of small molecules sampled through molecular dynamics simulations.

ConfID was developed by:

  • Bruno I. Grisci - PhD student (Institute of Informatics - UFRGS)
  • Marcelo D. Polêto - Postdoctoral Researcher (General Biology Department - UFV)
  • Marcio Dorn - Associate Professor (Institute of Informatics - UFRGS)
  • Hugo Verli - Associate Professor (Center of Biotechnology - UFRGS)

To which problem ConfID was designed for?


Genetic algorithms and knowledge-based approaches have been employed to study molecular flexibility. However, these methods are usually based on crystallographic information, and their calculations are made in vacuum or with implicit solvent and do not take into account the influence of explicit solvent molecules on conformational preferences.

By using MD simulations (and assuming accurate parameters are used), ConfID can identify all conformational populations sampled in the presence of solvent and quantify their relative abundance, while harnessing the benefits of MD and calculating time-dependent properties of each conformational population.

Just that? Really? Nothing else?


Well.... It is not the exact purpose to which ConfID was conceived, but it can also be used to characterize conformational populations of molecules upon structural restraint as well (like in ligand-receptor simulation, for example). However, you should bear in mind that the relative frequencies of each population should be taken cautiously since simulations of this type of system usually lack ergodicity. Be warned!

How does ConfID work?


If we consider the dihedral angles of a molecule throughout an MD simulation, a conformational population is the set of conformations sharing similar values for their respective dihedral angles. ConfID finds these conformational populations by following this procedure:

1 - The value of each dihedral angle is measured for each simulation time step, as well as the distribution of the angle (how much of the total simulated time was spent in each angle value). These distributions are smoothed using the Hann function with a sliding window of length 21°, obtaining a curve with well-behaved gradient.

2 - From this distribution, “peaks” and “valleys” are identified. A peak is an angle with maximum local value (the distribution of that angle is larger than the distribution of its immediate neighbors). A valley is an angle with minimum local value or angles with distribution below a given threshold that indicates a distribution value so low that the angle should be considered spurious.

Now that we have the peaks and valleys, ConfID can identify dihedral populations of each torsional bond as a peak angle between two valleys, which corresponds to a region of high distribution. The conformational populations of a molecule are then characterized by the combination of all dihedral angles (identified by the peak values) occurring at the same time step, that is, by a tuple of n peaks, with n being the number of torsional bonds. Thus, all conformations represented by the same tuple of dihedral values belong to the same conformational population, while the number of conformations that receive the same tuple determines the relative abundance of the conformational population and the number of different tuples is the number of different populations throughout a simulation.

How do I install ConfID?


ConfID installation is pretty straightforward. Just follow the steps:

a) INSTALLATION GUIDE

  • Download all files from https://github.com/sbcblab/confid and save them to your directory of choice.
  • ConfID requires Python distributions to be used and it is optimized for Python 2.7, but also compatible with Python 3.x.
  • Open a terminal in the working directory and type:

    $ python check_dep.py

  • This will check for all Python libraries required to run ConfID.
  • ConfID requires the following external libraries:
    -> graphviz
    -> matplotlib
    -> numpy
  • If any of those are missing, you should install it before continuing.


b) ALIASING CONFID

To run ConfID in any given directory without carrying a lot of files with you, we strongly advise you to alias ConfID. For this:

  • Copy the "ConfID" folder that was extracted before;
  • Move it to an installation folder of your choice;
  • Then, add the following line to the end of your ~/.bashrc file:

    alias confID='python /installation_folder_you_chose/ConfID/confID.py'
    (example: alias confID='python /home/marcelo/Tools/ConfID/confID.py)

  • Then run on terminal:

    $ source ~/.bashrc

  • After this, you should be able to run ConfID in your terminal at any given directory by simple typing:

    $ confID input.inp config

Which parameters can be used in the config file?


A configuration file (config) can be used to set ConfID parameters:

RESULTS_FOLDER (string) defaults to Populations/
DIH_POP_FOLDER (string) defaults to Dihedral_Regions/
NETWORK_FOLDER (string) defaults to Networks/
SHOW_Z (string) [False / True] defaults to False
NETWORK_CUTOFF (float) [>= 0.0] defaults to 0.01
PLOT_NETWORK (string) [False / True] defaults to False
CONVERGENCE_CUTOFF (float) [>= 0.0] defaults to 0.01
FACTOR_PEAK (float) [>= 1.0, < FACTOR_VALLEY] defaults to 50.0
FACTOR_VALLEY (float) [>= 1.0, > FACTOR_PEAK] defaults to 60.0
TIME_DEPENDENT_STATS (string) [False / True] defaults to True
DATA_1 (list of strings) [sum / max / min / aver / std / median / count] defaults to sum
DATA_2 (list of strings) [sum / max / min / aver / std / median / count] defaults to aver

A brief explanation of each parameter:

RESULTS_FOLDER: specifies the directory in which output files should be saved.
DIH_POP_FOLDER: specifies the directory in which output .xvg files should be saved.
NETWORK_FOLDER: specifies the directory in which output network files should be saved.
TIME_STATS_FOLDER: specifies the directory in which output transition files should be saved.
SHOW_Z: a flag that determines if spurious regions (Z) should be represented in the results. They will be used in the internal calculations nevertheless. If this is True, please consider setting PLOT_NETWORK to False, as plotting the chart may become too slow.
NETWORK_CUTOFF: the smallest transition frequency required for an edge to appear in the networks. If equal to 0.0, all edges are considered. If this cutoff is too small, please consider setting PLOT_NETWORK to False, as plotting the chart may become too slow.
PLOT_NETWORK: if True, networks figures for the transitions will be created using the graphviz library. Network text files will be created if it is either True or False.
CONVERGENCE_CUTOFF: the smallest population frequency at the end of the simulation required for the convergence file for that population to be generated. If equal to 0.0, all populations will be represented, but for a large number of dihedral angles, this can take a while.
FACTOR_PEAK: a factor that sets the constriction for peaks selection. Larger values lessen the constriction.
FACTOR_VALLEY: a factor that sets the constriction for valleys selection. Lower values lessen the constriction.
TIME_DEPENDENT_STATS: flag that determines if the statistics of the time stayed at each population should be computed.
DATA_1: list of functions that should be used as the x-axis of the charts of the statistics of the time stayed at each population and how the report should be ordered.
DATA_2: list of functions that should be used as the y-axis of the charts of the statistics of the time stayed at each population and how the report should be ordered.

The functions available for time-dependent properties calculation are:

- sum: total time in a population
- max: maximum time spent in a population without leaving
- min: minimum time spent in a population without leaving
- aver: average time spent in a population without leaving
- std: standard deviation of the average time
- median: median time spent in a population without leaving
- count: the amount of times of a transition event entering this population

Which files can be used in input.inp file?


An input file (input.inp) can be used to set which input files will be analyzed by ConfID:

# Order: Distribution, Fluctuation
DIH1.dist.xvg, DIH1.aver.xvg
DIH2.dist.xvg, DIH2.aver.xvg
DIH3.dist.xvg, DIH3.aver.xvg
DIH4.dist.xvg, DIH4.aver.xvg

Note: the order "Distribution, Fluctuation" in important! ConfID separates the line by the comma.

Tutorials


ANA/PIK-75 in water

ConfID runs persist to crash. What can be possibly wrong?


Assuming that all Python prerequisites were correctly installed (you can check it with check_dep.py) and you know what you are doing (have you tried our tutorials section?), the most common errors and warnings are:

  • “Segmentation Fault”
    In general, that means ConfID is using too much memory, most probably due to the number of frames saved in your inputs. You might want to try to close some applications or to reduce the frequency of frames in your inputs.
  • "ERROR: graphviz package needs to be installed if PLOT_NETWORK is True, but it couldn't be imported".
    This will happen if you try to plot networks within ConfID but does not have the Graphviz package installed in your system. You can either install Graphviz (https://graphviz.readthedocs.io/en/stable/manual.html) or set PLOT_NETWORK to False and use the .gml files generated in a network visualization software such as Cytoscape.
  • "ERROR: matplotlib.pyplot package needs to be installed if TIME_DEPENDENT_STATS is True, but it couldn't be imported."
    This will happen if you don't have the matplotlib package installed in your system. You can solve this by installing matplotlib (https://matplotlib.org/3.1.0/users/installing.html) or setting TIME_DEPENDENT_STATS to False.
  • "ERROR: FACTOR_PEAK must be larger or equal to 1.0."
    To solve this problem, set the FACTOR_PEAK value in your config file to be larger or equal to 1.0.
  • "ERROR: FACTOR_VALLEY must be larger or equal to 1.0."
    To solve this problem, set the FACTOR_VALLEY value in your config file to be larger or equal to 1.0.
  • "ERROR: FACTOR_VALLEY must be larger than FACTOR_PEAK"
    To solve this problem, set the FACTOR_VALLEY value in your config file to be larger than FACTOR_PEAK.
  • "ERROR: No peaks were found! Try using a larger FP to solve the problem."
    This error will happen if you use a FACTOR_PEAK so small that no peaks could be found.
  • "ERROR: No valleys were found! Try using a lower FV to solve the problem."
    This error will happen if you use a FACTOR_VALLEY so large that no peaks could be found.
  • "ERROR: Unidentified function:"
    This error will happen when an unidentified function is passed as an argument to DATA_1 or DATA_2 in the config file. The possible values are sum, max, min, aver, std, median, and count.
  • "WARNING: Unidentified parameter ignored"
    This means that some parameter in your config file couldn't be recognized and was ignored. Please review the spelling of the parameters and check if they match the documentation.
  • "WARNING: plotting the graph figures may become too slow if the Z populations are to be shown, please consider setting either SHOW_Z or PLOT_NETWORK to False, or to use a large NETWORK_CUTOFF."
  • "WARNING: the frequency of the conformational populations in the stay stats may vary slightly from the previous results if Z populations are disregarded."
  • "WARNING: Did not plot network with more than 200 nodes."

I have found a bug during my many many many hours of ConfID usage. I demand to speak to the manager!


We apologize for any bugs you may be witnessing. We kindly ask you to send your inputs and a brief description of what you are trying to achieve (organized screenshots may help) to confidcontact@gmail.com and our team will deal with it as soon as possible!

Publications


There are some papers already using ConfID! These are some:

  • Pablo R. Arantes, Marcelo D. Polêto, Elisa B. O. John, Conrado Pedebos, Bruno I. Grisci, Marcio Dorn, and Hugo Verli. Development of GROMOS-Compatible Parameter Set for Simulations of Chalcones and Flavonoids, The Journal of Physical Chemistry B 2019 123 (5), 994-1008, DOI: 10.1021/acs.jpcb.8b10139
  • Roberta Tesch, Christian Becker, Matthias P. Müller, Michael E. Beck, Lena Quambusch, Matthäus Getlik, Jonas Lategahn, Niklas Uhlenbrock, Fanny N. Costa, Marcelo D. Polêto, Pedro S.M. Pinheiro, Daniel A. Rodrigues, Carlos M.R. Sant'Anna, Fabio F. Ferreira, Hugo Verli, Carlos A.M. Fraga, Daniel Rauh. An Unusual Intramolecular Halogen Bond Guides Conformational Selection, Angew. Chem. Int. Ed. 2018, 57, 9970, DOI: 10.1002/anie.201804917

How to cite?


If you use ConfID in a scientific publication, we would appreciate citations to the following paper:

  • Marcelo D. Polêto, Bruno I. Grisci, Marcio Dorn, Hugo Verli. ConfID: an analytical method for conformational characterization of small molecules using molecular dynamics trajectories, Bioinformatics. 2019, Volume X, Issue X, Pages XXXX-XXXX, doi (in submission)

Image gallery