{% extends "layout.html" %} {% block title %}Dry-Lab Methods{% endblock %} {% block lead %}OUR METHODS OF ARTIFICAL SIMULATION OF MOLECULES AND THEIR INTERACTIONS{% endblock %} {% block page_content %}

Molecular dynamics documentation

Created by Vishwaa Kannan

Any files used or created by Amber or its related tools will be in the software repository for GCM-KY. Please retrieve any input/output files you require from there.

https://gitlab.igem.org/2024/software-tools/gcm-ky


Fixing last year's simulations

The error with the previous year’s simulation was how I set Matplotlib to parse the data files. This was the previous code:

              step = data[:,0]
              potential_energy = data[:,1]
              temperature = data[:,2]
              volume = data[:,3]
              kinetic_energy = data[:,4]
              total_energy = data[:,5]
            

The order was corrected to parse the data. This is the corrected code:

              step = data[:,0]
              potential_energy = data[:,1]
              temperature = data[:,4]
              volume = data[:,5]
              kinetic_energy = data[:,2]
              total_energy = data[:,3]
            

The graph part was re-run to obtain the correct graphs for pfoa luxr:

one two three

Docking of Human Liver Fatty Acid Binding Protein (FAB) with PFOA

The purpose of docking FAB and PFOA is to confirm that FAB works. We tested FAB in the lab and did not observe any noticeable fluorescence. So, we have docked these two into a complex, which we used for further calculations down the road.

Preparation of docking

We used AutoDock Vina on Windows, installed via their MSI installer.

https://vina.scripps.edu/downloads/

We used Alpha Fold 3 to fold the FAB sequence into a Protein Data Bank (PDB). Below is the URL and sequence we used.

https://alphafoldserver.com/

MSFSGKYQLQSQENFEAFMKAIGLPEELIQKGKDIKGVSE
IVQNGKHFKFTITAGSGSGSHNVYIMADKQKNGIKVNFKI
RHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSAL
SKDPNEKRDHMVLLEFVTAAGITHGMDELYKGGTGGSMRK
GEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLT
LKFICTTGKLPVPWPTLVTTFGYGVQCFARYPDHMKQHDF
FKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRI
ELKGIDFKEDGNILGHKLEYNYNGGKVIQNEFTVGEECEL
ETMTGEKVKTVVQLEGDNKLVTTFKNIKSVTELNGDIITN
TMTLGDIVFKRISKRI

We also retrieved the Structured Data File (SDF) for PFOA from PubChem by downloading the 3D conformer here:

https://pubchem.ncbi.nlm.nih.gov/compound/9554\#section=2D-Structure

In a molecule viewer application, we used Chimera X (which can be installed via their app). We opened the FAB PDB, added hydrogens, and saved the output file as a PDB.


              Tools > Structure Editing > Add Hydrogens > Also consider H-bonds (slower) > Ok
            

Then, in a new window, we opened the SDF for PFOA, deleted the hydrogen, and saved the output file as a PDB. This is essential, as PFOA is naturally deprotonated.


              Select > Chemistry > Element > H  
              Actions > Atoms/Bonds > Delete 
            

Creating AutoDock Vina Files

We then need to save our PDB files as AutoDock Vina files (PDBQT); this step is relatively simple.

To start AutoDock Vina, type “AutoDock Tools” in the search bar and run the application. After a short period, the GUI will load. Load the receptor PDB into AutoDock Vina by dragging it into the application. Then follow these commands to edit and save the receptor (FAB) file:


              Edit > Hydrogens > Add > Polar only > noBondOrder > yes  
              Edit > Charges > Add Kollman Charges > Ok  
              Grid > Macromolecule > Choose > FAB > Select Molecule > Ok > Save file<
            

Next, we will prepare the ligand with these commands:


              Right-click FAB > Delete
              Drag Ligand PDB file into AutoDock Vina
              Ligand > Input > Choose > PFOA
              Ligand > Output > save > Ok
            

Next, delete PFOA from AutoDock Vina via:


              Right-click PFOA > Delete
            

Performing the docking

We must select the domain to which AutoDock Vina will attempt to dock PFOA.

Drag the .pdbqt file for the FAB and PFOA into a new AutoDock Vina Window. It is essential not to overwrite charges; click “ok” and keep the charges when importing.


              Grid > Macromolecule > choose > FAB  
              Ligand > Input > Choose > PFOA  
              Grid > Gridbox  
              X-size: 60  
              Y-size: 60  
              Z-size: 60  
              X-center: 3.660  
              Y-center: -11.488  
              Z-center: 16.008
            

This should show a red, blue, and green box that occupies the binding pocket for FAB

Next, we will complete the docking procedure, open the command prompt, and type the following commands:


              cd path/to/files/  
              touch config.txt  
              nano config.txt  
              receptor = fab.pdbqt  
              ligand = pfoa.pdbqt  
              center_x = 3.660  
              center_y = -11.488  
              center_z = 16.008  
              size_x = 60
              size_y = 60
              size_z = 60
              energy_range = 20
              exhaustiveness = 500
              {ctrl + x}
              Y
              {enter}
            

Docking is an exhaustive study, so AutoDock Vina will try however many positions you tell it to and pick the best 10. The energy range represents the difference in binding affinity (Kcal/mol) between the best and the worst poses.

Finally, we can run AutoDock Vina. Find the path to Vina in your file manager and enter the following command into Command Prompt.


              “c:path/to/vina” –receptor protein.pdbqt –ligand ligand.pdbqt –config config.txt –log log.txt –out output.pdbqt
            

To extract the best pose, open the output.pdbqt file in Chimera X and select the first model at the bottom right of the window. Then click save, saving only the selected model. You should now have an output. PDB file with your complex.

Confirmation of the docking of Fab-Gfp to PFOA

Input / Output files are located in the Software Tools Repository
https://gitlab.igem.org/2024/software-tools/gcm-ky

Now that we have an approximate dock of PFOA and FAB, we need to simulate it to ensure the bind is stable. To do this, we used Amber. Amber can be installed by following their guide on the website and their PDF document here:

https://ambermd.org/doc12/Amber24.pdf

Since we are just determining if the bind is stable, we simulated the complex in an implicit solvent system, as this system is less accurate but very efficient with resources.

Preparing the Simulation

Generating Force Fields with Antechamber

Since we are working with the PFOA ligand (perfluorooctanoic acid), we must generate its charges and force field parameters. PFOA is deprotonated and thus should have a net charge of -1. Using the following antechamber commands, we will create a .mol2 file for PFOA and assign AM1-BCC charges suitable for small organic molecules in AMBER simulations. parmchk2 is then used to generate missing force field parameters.


              antechamber -i pfoa.pdb -fi pdb -o pfoa.mol2 -fo mol2 -c bcc -nc -1 -s 2
              parmchk2 -i pfoa.mol2 -f mol2 -o pfoa.frcmod
            

This creates two important files:

  1. pfoa.mol2: contains the ligand structure with BCC charges.
  2. pfoa.frcmod: additional force field parameters not covered by the standard GAFF.

Generating Topology and Coordinate Files with LEaP

To generate the necessary topology (prmtop) and coordinate (rst7) files, we will use LEaP, specifying the appropriate force fields and water model.

LEaP Input (leap_s.in)


              source leaprc.protein.ff19SB
              source leaprc.gaff2
              source leaprc.water.opc
              UNL = loadmol2 pfoa.mol2
              check UNL
              loadamberparams pfoa.frcmod
              saveoff UNL unl.lib
              saveamberparm UNL pfoa.prmtop pfoa.rst7
              quit
            

Force fields:

ff19SB: For the protein.

gaff2: General force field for organic molecules, like PFOA.

water.opc: For the water model used in implicit solvation.

This script prepares the prmtop and rst7 files for PFOA, which we’ll later combine with the protein for docking.

To run the script, use the following command:


              tleap -s -f tleap_s.in
            

Creating the Complex

Now that we have the ligand’s force field parameters, we’ll combine it with the protein to form the PFOA-FAB complex.

LEaP Input (tleap_f.in)


              source leaprc.protein.ff19SB
              source leaprc.gaff2
              source leaprc.water.opc

              # Load the ligand
              UNL = loadmol2 pfoa.mol2

              loadamberparams pfoa.frcmod
              loadoff unl.lib

              # Load the protein
              protein = loadpdb justFABnoHYDRO.pdb

              # Combine the protein and ligand
              complex = combine {protein UNL}

              # Save the combined complex
              saveamberparm complex pfoa_dock.prmtop pfoa_dock.rst7
              savepdb complex pfoa_dock.pdb
              quit
            

This script loads the ligand (pfoa.mol2) and the protein (justFABnoHYDRO.pdb), combines them into a single complex, and generates the topology and coordinate files needed for molecular dynamics (pfoa_dock.prmtop and pfoa_dock.rst7).

Use this command to run the script:


              tleap -s -f tleap_f.in
            

Running the Simulation

Minimization (min.in)

The first step in the MD simulation process is minimization. We’ll perform an initial minimization to relieve any steric clashes or bad contacts between atoms in the system. Here, we use the igb=1 parameter to specify that we’re running a simulation of an implicit solvent environment.

Minimization Input (min.in)


              &cntrl
              imin=1, maxcyc=200, ncyc=50,
              cut=16, ntb=0, igb=1,
              &end
            

maxcyc=200: The maximum number of minimization cycles.

ncyc=50: The switch point for using steepest descent and conjugate gradient methods.

igb=1: Generalized Born implicit solvent model.

cut=16: Non-bonded cutoff distance (16 Å).

To run this simulation, use this command:


              pmemd.cuda -O -i min.in -o min.out -p protein_solvated.prmtop -c protein_solvated.inpcrd -r min.rst -ref protein_solvated.inpcrd
            

Production (prod.in)

After minimization, we will perform the production molecular dynamics (MD) run. This is just a longer equilibration. In this step, we will keep the system at 300 K and allow it to equilibrate. We will use a timestep of 2 fs, Langevin dynamics (ntt=3), and the igb=1 implicit solvation model.

Production Input (prod.in)

Production MD run


              &cntrl
              imin=0, irest=0,
              nstlim=50000, dt=0.002, ntc=2, ntf=2,
              ntpr=500, ntwx=500,
              cut=16, ntb=0, igb=1,
              ntt=3, gamma_ln=1.0,
              tempi=0.0, temp0=300.0,
              &end
            

nstlim=50000: Number of steps (corresponds to 100 ps for a 2 fs timestep).

ntpr=500, ntwx=500: Write output and trajectory files every 500 steps.

igb=1: Implicit solvent (Generalized Born model).

ntt=3, gamma_ln=1.0: Langevin thermostat with a collision frequency of 1.0 ps⁻¹./p>

temp0=300.0: Target temperature (300 K).

cut=16: Non-bonded cutoff distance (16 Å).

To run this step, use the following pmemd.cuda command:


              -O -i prod.in -o prod.out -p pfoa_dock.prmtop -c pfoa_dock.rst7 -r prod.rst -x prod.nc
            

This will generate the MD trajectory (prod.nc) and output the final restart file (prod.rst).

Extracting Structures

After running the MD simulation, we can use the following input files to extract the final structures of the protein, ligand, and the entire complex.

Extract Full Complex (extract.in)


              parm pfoa_dock.prmtop
              trajin prod.nc
              trajout final_structure.pdb pdb onlyframes 100
              run
            

The following command extracts the entire complex at the 100th frame from the trajectory file.


              cpptraj -i extract.in
            

Extract Ligand (extractLig.in)


              parm pfoa_dock.prmtop
              trajin prod.nc
              strip !:UNL
              trajout final_ligand.pdb pdb onlyframes 100
              run
            

This command strips out everything except the ligand (UNL) and writes it to final_ligand.pdb.


              cpptraj -i extractLig.in
            

Extract Protein (extractProt.in)


              parm pfoa_dock.prmtop
              trajin prod.nc
              strip :UNL
              trajout final_protein.pdb pdb onlyframes 100
              run
            

This strips out ligand and writes the protein-only structure to final_protein.pdb.


              cpptraj -i extractProt.in
            

Examination of structures

The resulting PDBs from the extraction were viewed visually in ChimeraX, and it was confirmed that PFOA was still in the binding pocket.

Conclusion

It is important to note that this simulation was performed to confirm that the docking of PFOA to FAB was correct. However, not all proper procedures were followed, such as separating the production and equilibration, using implicit instead of explicit simulations, and shortened simulation times.

Documentation for MD MMPBSA Pipeline

Input / Output files are in the Software Tools Repository (including the pipeline!).
https://gitlab.igem.org/2024/software-tools/gcm-ky

This document explains how to set up, run, and understand the MMPBSA pipeline, which simulates and analyzes the interactions between proteins and ligands, such as the FAB-PFOA complex, using the Molecular Mechanics Poisson-Boltzmann Surface Area method. It also covers performing mutagenesis with ChimeraX using rotamers and how to source and run the pipeline.

Pipeline Overview

The script Simulation.sh automates a full molecular dynamics (MD) simulation pipeline, which uses an MPBSA analysis to calculate the binding affinity and give you a per-residue breakdown of the charge composition. The result from this pipeline can be pasted into an Excel sheet, and a pie graph will show which residues play the most significant role in the binding. This pipeline assumes you have already done the previous steps and obtained a pfoa.mol2 and a pfoa.frcmod.

Prerequisites

Before running the pipeline, ensure the following are installed:

  1. AmberTools/AMBER (with CUDA support for pmemd.cuda).
  2. MPI (for MMPBSA parallelization).
  3. Python 3 (for data analysis scripts).
  4. ChimeraX (for mutagenesis).

Making the Script Executable

To run the pipeline, follow these steps:

Make the Simulation.sh executable:


              chmod +x Simulation.sh
            

Run the script with the required arguments:


              -r receptor.pdb: Path to the receptor PDB file.
              -l ligand.pdb: Path to the ligand PDB file.
              -c complex.pdb: Path to the complex PDB file (protein-ligand complex).
              -n output_directory: Directory where output files will be stored.
              -j num_cores: Number of cores to use for MMPBSA calculations.
            

Example Usage:


              ./Simulation.sh -r protein.pdb -l ligand.pdb -c complex.pdb -n fab_pfoa_simulation -j 8
            

This creates an output directory, prepares input files, runs the MD simulation, and performs binding energy analysis.

Explanation of Input Files

The pipeline uses several key .in files for different stages of the simulation. These files control the simulation parameters, so understanding what each parameter does is critical.

min.in (Minimization)


              &cntrl
              imin=1, maxcyc=1000, ncyc=500,
              cut=10.0, ntb=1,
              ntc=2, ntf=2,
              ntpr=100,
              ntr=1, restraintmask=':1-376', restraint_wt=2.0
            

imin=1: Runs minimization (0 would run MD).

maxcyc=1000: Maximum number of minimization cycles.

ncyc=500: The first 500 cycles are steepest descent; the remaining are conjugate gradient.

cut=10.0: Cutoff distance for nonbonded interactions (10 Å).

ntb=1: Constant volume periodic boundary conditions (no pressure coupling).

ntc=2, ntf=2: Bonds involving hydrogen atoms are constrained with SHAKE.

npr=100: Print the output every 100 steps.

ntr=1: Apply harmonic restraints.

restraintmask=':1-376': Apply restraints to residues 1 to 376.

restraint_wt=2.0: Restraint weight of 2 kcal/mol/Ų.

heat.in (Heating)


              &cntrl
              imin=0, irest=0, ntx=1,
              nstlim=25000, dt=0.002,
              ntc=2, ntf=2,
              cut=10.0, ntb=1,
              ntpr=500, ntwx=500,
              ntt=3, gamma_ln=2.0,
              tempi=0.0, temp0=300.0, ig=-1,
              ntr=1, restraintmask=':1-376', restraint_wt=2.0,
              nmropt=1
              /
              &wt TYPE='TEMP0', istep1=0, istep2=25000,
              value1=0.1, value2=300.0, 
              /
              &wt TYPE='END'
              / 
            

imin=0: No minimization; we are running MD.

irest=0, ntx=1: No restart; start from initial coordinates.

nstlim=25000, dt=0.002: 25,000 steps with a 2 fs timestep.

ntt=3, gamma_ln=2.0: Langevin thermostat with a collision frequency of 2 ps⁻¹.

temp0=300.0: Target temperature is 300 K.

ntr=1, restraintmask=':1-376': Restraints on residues 1 to 376.

Temperature ramping: Temperature is ramped from 0.1 K to 300 K over 25,000 steps.

density.in (Density Equilibration)


              &cntrl
              imin=0, irest=1, ntx=5,
              nstlim=25000, dt=0.002,
              ntc=2, ntf=2,
              cut=10.0, ntb=2, ntp=1, taup=1.0,
              ntpr=500, ntwx=500,
              ntt=3, gamma_ln=2.0,
              temp0=300.0, ig=-1,ntr=1, restraintmask=':1-376', restraint_wt=2.0,
              /
            

ntb=2: Constant pressure periodic boundary conditions.

ntp=1: Isotropic pressure scaling.

taup=1.0: Pressure relaxation time of 1 ps.

Other conditions similar to Heating.

equil.in (Equilibration)


              &cntrl
              imin=0, irest=1, ntx=5,
              nstlim=250000, dt=0.002,
              ntc=2, ntf=2,
              cut=10.0, ntb=2, ntp=1, taup=2.0,
              ntpr=1000, ntwx=1000,
              ntt=3, gamma_ln=2.0,
              temp0=300.0, ig=-1,
              /
            

This is a longer equilibration step, with 250,000 steps (500 ps).

prod.in (Production)


              &cntrl
              imin=0, irest=1, ntx=5,
              nstlim=10000000, dt=0.002,
              ntc=2, ntf=2,
              cut=10.0, ntb=2, ntp=1, taup=2.0,
              ntpr=5000, ntwx=5000,
              ntt=3, gamma_ln=2.0,
              temp0=300.0, ig=-1,
              /
            

nstlim=10000000, dt=0.002: Production run with 10 million steps, totaling 20 ns of MD simulation.

mmpbsa.in (MMPBSA Calculation)


              &general
              startframe=601, endframe=2000, keep_files=2,
              /
              &pb
              istrng=0.100,
              /
              &decomp
              idecomp=1,
              /
            

startframe=601, endframe=2000: Start MMPBSA calculations from frame 601 to 2000.

istrng=0.100: Ionic strength is 0.1 M.

idecomp=1: Perform per-residue decomposition analysis.

Mutagenesis Using ChimeraX with Rotamers

Mutating residues in ChimeraX can be done as follows:

Open ChimeraX and load the structure:


              open structure.pdb
            

Select the residue to mutate: In ChimeraX, click on the residue you want to mutate or use a selection command:


              select :<residue_number>
            

Perform the mutation using rotamers:


              swapaa <new_residue> :<residue_number> rotamer
            

For example, to mutate residue 45 to Alanine:


              swapaa ala :45 rotamer
            

Save the mutated structure:


              save mutated_structure.pdb
            

After mutating, the new structure can be run through the same pipeline using the Simulation.sh script.

Conclusion

Multiple simulations can be run on this with a few minor changes, with the main change required being the restraint mask. For example, I have used this pipeline to simulate FAB-GFP and PFOA, mutations of FAB-GFP and PFOA: ILE 308 → ARG, SER 349 → THR, and even for Palmitic Acid (the natural ligand to FAB) and FAB-GFP. However, this step required me to run palmitic acid through antechamber to get a forcefield for it. After each pipeline use, I pasted the resulting data into Excel and made a pie graph. This helps me see which residues contribute most to the ligand binding to the protein.

Umbrella Sampling

Created by Edward Kim

Any files used or created by OpenMM or its related tools will be in the software repository for GCM-KY. Please retrieve any input/output files you require from there.

https://gitlab.igem.org/2024/software-tools/gcm-ky


Dependencies

You need to install the Conda dependencies this script uses.

  • Conda
  • Pip
  • OpenMM

Conda Setup

To set up the Conda environment, type the following command. Replace PATHTOPACAKGELISTTXT with the path to the environment.yml file:

conda env create -f PATHTOPACAKGELISTTXT

To activate the environment:

conda activate testenv

Add Permissions for Wham Subprocess

MacOS

To add executable permissions, run the following commands:

              cd path_to_scoresolver 
                cd wham 
                chmod +x wham
              
            

To verify:

ls -l wham

If you see an error:

chmod: wham: No such file or directory

Compile wham.c using:

make

Program Flow

            ---
            title: Bind Coefficient Calculation
            ---
            graph LR;
                pdb[("test.pdb")]
                win["windows"]
                mt["metafile.txt & cv data"]
                pmf["pmf.txt"]
                traj["smd_traj.dcd"]
                wl["wham_logs.txt"]
                kd["$$Kd$$"]
                pdb -- "SMD Pulling" --> traj & win
                win -- "Umbrella Sampling" --> mt
                mt -- "WHAM" --> pmf & wl
                pmf -- "Binding Coeff. Algo" --> kd
                win -- "Binding Coeff. Algo" --> kd
            

Troubleshooting

If the histograms have poor overlap, you will need to use more windows and/or reduce the spring constant.

The windows do not need to be linearly spaced, and they can have different spring constants.

Theory

Definitions

Important terms used:

  • F: Free energy
  • kB: Boltzmann constant
  • T: Temperature
  • Z: Canonical partition function
  • ξ: Collective Variable (CV) represents the progress of a reaction into a single value of a state
  • r: Atomic coordinates
  • P(ξ): Probability distribution of the system as a function of CV
  • β: Inverse temperature ($= 1 / k_B T $)
  • U(r): Potential energy of the system
  • Uibias: Biasing potential for window $i $
  • k: Force constant for the biasing potential
  • ξi: Target value of the CV in window $i $
  • Pibias(ξ): Biased probability distribution for window $i $
  • ni(ξ): Number of samples in bin ξ for window $i $
  • Fibias(ξ): Biased free energy in window $i $
  • F(ξ): Unbiased free energy (Potential of Mean Force, PMF)
  • δa(x): Dirac delta function

Equations

  1. Reaction Coordinate

ξ(r) = the progress of a reaction as function of atomic configuration/coordinates

Note: different atomic configurations r can have the same reaction coordinate due to ξ’s limited scope, especially in reversible reactions/processes. However, this is a necessary sacrifice because we need to reduce complexity and dimension of the problem. Essentially, ξ is a simplification of r.

  1. Free energy in the canonical ensemble:

F = −kBTln Z

In context, F can be expressed in terms of a specific state ξ, so F(ξ) = −kBTln P(ξ) + C

  1. Dirac Delta Function: ( δ(x) = δ1(x) ):

$$ {\displaystyle \delta _{a}(x)={\frac {1}{\left|a\right|{\sqrt {\pi }}}}e^{-(x/a)^{2}}} $$

In the case of this theory, it projects the space of r to the space of ξ. It picks out the relevant values of ξ(r) close to the value ξ in question and concentrates/amplifies those specific values of r.

P(ξ) = ∫P(r)δ(ξ − ξ(r))dr

  1. Partition Function (discrete):

Z = ∑energies of ith state weighed by boltzmann distribution factor = ∑ieβEi

Encapsulates the total distribution of energy, which can later be used to normalize probability distributions over boltzmann factors.

  1. Probability Distribution in r

$$ P(\textbf{r}) = \frac{\exp(-\beta E(r))}{Z} : \beta = \frac{1}{k_B T} $$

Here higher values of β, or coldness, makes states with lower temperatures more probable than otherwise. While lower values of β, leading to higher temperatures, makes states with higher temperatures more probable than otherwise. Regardles of positive β > 0, lower temperature states are more probable than higher temperature states. Note that λ ≡ β.

  1. Probability Distribution (continous):

$$ P(\xi) = \int P(\textbf{r})\delta(\xi−\xi(\textbf{r}))d\textbf{r} = \frac{1}{Z} \int e^{-\beta E(\textbf{r})}\delta(\xi−\xi(\textbf{r}))d\textbf{r} $$

Which is the integration over all possible degrees of freedom of a reaction coordinate ξ

In all, its basically multiplying all the possible factors that affect the probability of having state ξ and integrating them because again, probability is a statistical quantity that is approximated.

Adding Biasing

$$ w_i(\xi) = \frac12 k(\xi - \xi_i)^2 $$

It acts a spring force with strength $k $ that restrains the system simulation to state ξi, the center of window i. For example, if the system at ξ is a bout to get away from ξi, a bias force will force the system to get back to the window, resisting normal eneergy landscape activity. If the strength of k is weak, the system may be able to explore a little more; contrastly, if k too strong, it may not explore enough of each window, so a balance is required. There are multiple windows i = 1, 2, 3…n that each sample sufficient amounts of the free energy landscape. This is the main premise of umbrella sampling that separates from other MD simulations.
NOTE: each quantity discussed previously must have biased version indicated by an apostrophe superscript bias. After computing/simulating the system with biasing potentials, we subtract the biased results by the biasing potential and retrive the true energy curve.

  1. Energy at window i to unbiased version:

$$ E_i^\text{bias}(\textbf{r}) = E_i(\textbf{r}) + w_i((\xi(\textbf{r}))) \quad [8] $$

  1. Probability distribution at window i to unbiased version:

$$ P_i^\text{bias}(\xi) \propto \int e^{-\beta (E_i^\text{bias}(\textbf{r}))} \delta (\xi - \xi(\textbf{r})) dr \propto P_i(\xi) e^{-\beta w_i(\xi)}, \quad [9] $$

  1. Free Energy at window i to unbiased version:

$$ F^\text{bias}(\xi) = -k_B T \ln P_i(\xi) + w_i(\xi) + C_i = F_i(\xi) + w_i(\xi) + C_i. \quad [10] $$

Combining equations [8], [9], [10],

$$ F_i(\xi) = F_i^\text{bias}(\xi)-w_i(\xi) + C_i = -k_B T \text{log}(P_i^\text{bias}(\xi)) - w_i + C_i. $$

Calculating Binding Constant of a Ligand-Receptor System from the Free Energy Curve

Theoretically,

$$K_d = \frac{[L][R]}{[LR]}$$

where [L], [R], [LR] are the cocentrations of the ligand, receptor, and ligand-receptor complexes respectively. However, since this is computationally intractible to calculate, a more convenient equation would be a relationship between binding free energy and the Kd constant.

  1. Absolute Binding Free Energy

$$ \Delta F^\circ = k_B T\ln K_\mathrm{d} \Rightarrow K_\mathrm{d} = e^{\frac{\Delta F^\circ}{k_B T}} $$

Where 1 M is 1 molar concentration and R is the universal gas constant. We determine ΔG with the PMF graph we have obtanied. In order to do that, we can use this equation:

$$ \Delta G^\circ = -k_B T \ln \frac{\int_{\mathrm{pocket}} e^{-\frac{\Delta F(\xi)}{k_B T}} \, \mathrm{d}\xi}{\int_{\mathrm{bulk}} e^{-\frac{\Delta F(\xi)}{k_B T}} \, \mathrm{d}\xi} = -k_B T \frac{Z^\circ_{\mathrm{pocket}}}{Z^\circ_{\mathrm{bulk}}} $$

  • pocket := {ξ : ligand at state ξ is bound to the receptor}
  • bulk := {ξ : ligand at state ξ is not interacting with the receptor}

So, what we have right now is the F(ξ) curve. Recognizing these as the continous definitions of the partition function, we simplify it down to:

ΔG = Fpocket − Fbulk

Selecting Parameters

The most important parameters are:

  1. Li and Lf: the starting and ending lengths of the complex you are trying to stretch

  2. index1 and index2: the indexes of the two atoms we want to pivot as endpoints and stretch accordingly.

  3. fcpull: the strength of bias

  4. vpulling: the speed of exeucting the pulling loop to streth the two pivot atoms (the simulation will thus adjust the total time of the pulling loop based on Li and Lf as well)

Virtually, Li can be easily determined by looking at the original PDB file through a MD visualizer, and Lf can be any value Lf ≫ Li (by maybe a factor less than 102 ish) if you are stretching.

Additionally, since the simulation is purely discretized, vpulling should be picked at a low value to ensure it captures energy landscape adequetly.

However, the pivot indexes can be a little tricky to determine. But, the best way to start off is by identifying your objective and looking at the starting PDB to get a glipmse at what you’re working with.

Generally Pick the strongest parts of the Complex

        
                1. For proteins: C-alpha atoms are picked because they're usually the backbone atoms that represent the structure very well, so stretching that wont break the system entirely (they are strong enough to stretch).
                2. For nucleic acids: Phosphate atoms or specific base atoms (because again, they're the strongest backbones).
                3. For small molecules or ligands: Select atoms that are involved in key interactions or are structurally significant.
              
If you are trying to get the binding affinity of a docked complex (which is what I am trying to do in this case,) I would select the center of mass of the ligand, and the key atom in a resiude in the binding pocket of the receptor. As stored in the
./gh_scoresolver/tests/pdb/ahl_dock_luxr_1.pdb
PDB file, we have a ligand protein ahl docked into a receptor protein luxr.

{% endblock %}