{% extends "layout.html" %} {% block title %}Dry-Lab Methods{% endblock %} {% block lead %}OUR METHODS OF ARTIFICAL SIMULATION OF MOLECULES AND THEIR INTERACTIONS{% endblock %} {% block page_content %}
Created by Vishwaa Kannan
Any files used or created by Amber or its related tools will be in the software repository for GCM-KY. Please retrieve any input/output files you require from there.
https://gitlab.igem.org/2024/software-tools/gcm-ky
The error with the previous year’s simulation was how I set Matplotlib to parse the data files. This was the previous code:
step = data[:,0] potential_energy = data[:,1] temperature = data[:,2] volume = data[:,3] kinetic_energy = data[:,4] total_energy = data[:,5]
The order was corrected to parse the data. This is the corrected code:
step = data[:,0] potential_energy = data[:,1] temperature = data[:,4] volume = data[:,5] kinetic_energy = data[:,2] total_energy = data[:,3]
The graph part was re-run to obtain the correct graphs for pfoa luxr:
The purpose of docking FAB and PFOA is to confirm that FAB works. We tested FAB in the lab and did not observe any noticeable fluorescence. So, we have docked these two into a complex, which we used for further calculations down the road.
We used AutoDock Vina on Windows, installed via their MSI installer.
https://vina.scripps.edu/downloads/
We used Alpha Fold 3 to fold the FAB sequence into a Protein Data Bank (PDB). Below is the URL and sequence we used.
MSFSGKYQLQSQENFEAFMKAIGLPEELIQKGKDIKGVSE
IVQNGKHFKFTITAGSGSGSHNVYIMADKQKNGIKVNFKI
RHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSAL
SKDPNEKRDHMVLLEFVTAAGITHGMDELYKGGTGGSMRK
GEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLT
LKFICTTGKLPVPWPTLVTTFGYGVQCFARYPDHMKQHDF
FKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRI
ELKGIDFKEDGNILGHKLEYNYNGGKVIQNEFTVGEECEL
ETMTGEKVKTVVQLEGDNKLVTTFKNIKSVTELNGDIITN
TMTLGDIVFKRISKRI
We also retrieved the Structured Data File (SDF) for PFOA from PubChem by downloading the 3D conformer here:
https://pubchem.ncbi.nlm.nih.gov/compound/9554\#section=2D-Structure
In a molecule viewer application, we used Chimera X (which can be installed via their app). We opened the FAB PDB, added hydrogens, and saved the output file as a PDB.
Tools > Structure Editing > Add Hydrogens > Also consider H-bonds (slower) > Ok
Then, in a new window, we opened the SDF for PFOA, deleted the hydrogen, and saved the output file as a PDB. This is essential, as PFOA is naturally deprotonated.
Select > Chemistry > Element > H
Actions > Atoms/Bonds > Delete
We then need to save our PDB files as AutoDock Vina files (PDBQT); this step is relatively simple.
To start AutoDock Vina, type “AutoDock Tools” in the search bar and run the application. After a short period, the GUI will load. Load the receptor PDB into AutoDock Vina by dragging it into the application. Then follow these commands to edit and save the receptor (FAB) file:
Edit > Hydrogens > Add > Polar only > noBondOrder > yes
Edit > Charges > Add Kollman Charges > Ok
Grid > Macromolecule > Choose > FAB > Select Molecule > Ok > Save file<
Next, we will prepare the ligand with these commands:
Right-click FAB > Delete
Drag Ligand PDB file into AutoDock Vina
Ligand > Input > Choose > PFOA
Ligand > Output > save > Ok
Next, delete PFOA from AutoDock Vina via:
Right-click PFOA > Delete
We must select the domain to which AutoDock Vina will attempt to dock PFOA.
Drag the .pdbqt file for the FAB and PFOA into a new AutoDock Vina Window. It is essential not to overwrite charges; click “ok” and keep the charges when importing.
Grid > Macromolecule > choose > FAB
Ligand > Input > Choose > PFOA
Grid > Gridbox
X-size: 60
Y-size: 60
Z-size: 60
X-center: 3.660
Y-center: -11.488
Z-center: 16.008
This should show a red, blue, and green box that occupies the binding pocket for FAB
Next, we will complete the docking procedure, open the command prompt, and type the following commands:
cd path/to/files/
touch config.txt
nano config.txt
receptor = fab.pdbqt
ligand = pfoa.pdbqt
center_x = 3.660
center_y = -11.488
center_z = 16.008
size_x = 60
size_y = 60
size_z = 60
energy_range = 20
exhaustiveness = 500
{ctrl + x}
Y
{enter}
Docking is an exhaustive study, so AutoDock Vina will try however many positions you tell it to and pick the best 10. The energy range represents the difference in binding affinity (Kcal/mol) between the best and the worst poses.
Finally, we can run AutoDock Vina. Find the path to Vina in your file manager and enter the following command into Command Prompt.
“c:path/to/vina” –receptor protein.pdbqt –ligand ligand.pdbqt –config config.txt –log log.txt –out output.pdbqt
To extract the best pose, open the output.pdbqt file in Chimera X and select the first model at the bottom right of the window. Then click save, saving only the selected model. You should now have an output. PDB file with your complex.
Input / Output files are located in the Software Tools Repository
https://gitlab.igem.org/2024/software-tools/gcm-ky
Now that we have an approximate dock of PFOA and FAB, we need to simulate it to ensure the bind is stable. To do this, we used Amber. Amber can be installed by following their guide on the website and their PDF document here:
https://ambermd.org/doc12/Amber24.pdf
Since we are just determining if the bind is stable, we simulated the complex in an implicit solvent system, as this system is less accurate but very efficient with resources.
Since we are working with the PFOA ligand (perfluorooctanoic acid), we must generate its charges and force field parameters. PFOA is deprotonated and thus should have a net charge of -1. Using the following antechamber commands, we will create a .mol2 file for PFOA and assign AM1-BCC charges suitable for small organic molecules in AMBER simulations. parmchk2 is then used to generate missing force field parameters.
antechamber -i pfoa.pdb -fi pdb -o pfoa.mol2 -fo mol2 -c bcc -nc -1 -s 2
parmchk2 -i pfoa.mol2 -f mol2 -o pfoa.frcmod
This creates two important files:
To generate the necessary topology (prmtop) and coordinate (rst7) files, we will use LEaP, specifying the appropriate force fields and water model.
LEaP Input (leap_s.in)
source leaprc.protein.ff19SB
source leaprc.gaff2
source leaprc.water.opc
UNL = loadmol2 pfoa.mol2
check UNL
loadamberparams pfoa.frcmod
saveoff UNL unl.lib
saveamberparm UNL pfoa.prmtop pfoa.rst7
quit
Force fields:
ff19SB: For the protein.
gaff2: General force field for organic molecules, like PFOA.
water.opc: For the water model used in implicit solvation.
This script prepares the prmtop and rst7 files for PFOA, which we’ll later combine with the protein for docking.
To run the script, use the following command:
tleap -s -f tleap_s.in
Now that we have the ligand’s force field parameters, we’ll combine it with the protein to form the PFOA-FAB complex.
LEaP Input (tleap_f.in)
source leaprc.protein.ff19SB
source leaprc.gaff2
source leaprc.water.opc
# Load the ligand
UNL = loadmol2 pfoa.mol2
loadamberparams pfoa.frcmod
loadoff unl.lib
# Load the protein
protein = loadpdb justFABnoHYDRO.pdb
# Combine the protein and ligand
complex = combine {protein UNL}
# Save the combined complex
saveamberparm complex pfoa_dock.prmtop pfoa_dock.rst7
savepdb complex pfoa_dock.pdb
quit
This script loads the ligand (pfoa.mol2) and the protein (justFABnoHYDRO.pdb), combines them into a single complex, and generates the topology and coordinate files needed for molecular dynamics (pfoa_dock.prmtop and pfoa_dock.rst7).
Use this command to run the script:
tleap -s -f tleap_f.in
The first step in the MD simulation process is minimization. We’ll perform an initial minimization to relieve any steric clashes or bad contacts between atoms in the system. Here, we use the igb=1 parameter to specify that we’re running a simulation of an implicit solvent environment.
Minimization Input (min.in)
&cntrl
imin=1, maxcyc=200, ncyc=50,
cut=16, ntb=0, igb=1,
&end
maxcyc=200: The maximum number of minimization cycles.
ncyc=50: The switch point for using steepest descent and conjugate gradient methods.
igb=1: Generalized Born implicit solvent model.
cut=16: Non-bonded cutoff distance (16 Å).
To run this simulation, use this command:
pmemd.cuda -O -i min.in -o min.out -p protein_solvated.prmtop -c protein_solvated.inpcrd -r min.rst -ref protein_solvated.inpcrd
After minimization, we will perform the production molecular dynamics (MD) run. This is just a longer equilibration. In this step, we will keep the system at 300 K and allow it to equilibrate. We will use a timestep of 2 fs, Langevin dynamics (ntt=3), and the igb=1 implicit solvation model.
Production Input (prod.in)
Production MD run
&cntrl
imin=0, irest=0,
nstlim=50000, dt=0.002, ntc=2, ntf=2,
ntpr=500, ntwx=500,
cut=16, ntb=0, igb=1,
ntt=3, gamma_ln=1.0,
tempi=0.0, temp0=300.0,
&end
nstlim=50000: Number of steps (corresponds to 100 ps for a 2 fs timestep).
ntpr=500, ntwx=500: Write output and trajectory files every 500 steps.
igb=1: Implicit solvent (Generalized Born model).
ntt=3, gamma_ln=1.0: Langevin thermostat with a collision frequency of 1.0 ps⁻¹./p>
temp0=300.0: Target temperature (300 K).
cut=16: Non-bonded cutoff distance (16 Å).
To run this step, use the following pmemd.cuda command:
-O -i prod.in -o prod.out -p pfoa_dock.prmtop -c pfoa_dock.rst7 -r prod.rst -x prod.nc
This will generate the MD trajectory (prod.nc) and output the final restart file (prod.rst).
After running the MD simulation, we can use the following input files to extract the final structures of the protein, ligand, and the entire complex.
Extract Full Complex (extract.in)
parm pfoa_dock.prmtop
trajin prod.nc
trajout final_structure.pdb pdb onlyframes 100
run
The following command extracts the entire complex at the 100th frame from the trajectory file.
cpptraj -i extract.in
Extract Ligand (extractLig.in)
parm pfoa_dock.prmtop
trajin prod.nc
strip !:UNL
trajout final_ligand.pdb pdb onlyframes 100
run
This command strips out everything except the ligand (UNL) and writes it to final_ligand.pdb.
cpptraj -i extractLig.in
Extract Protein (extractProt.in)
parm pfoa_dock.prmtop
trajin prod.nc
strip :UNL
trajout final_protein.pdb pdb onlyframes 100
run
This strips out ligand and writes the protein-only structure to final_protein.pdb.
cpptraj -i extractProt.in
The resulting PDBs from the extraction were viewed visually in ChimeraX, and it was confirmed that PFOA was still in the binding pocket.
It is important to note that this simulation was performed to confirm that the docking of PFOA to FAB was correct. However, not all proper procedures were followed, such as separating the production and equilibration, using implicit instead of explicit simulations, and shortened simulation times.
Input / Output files are in the Software Tools Repository (including the pipeline!).
https://gitlab.igem.org/2024/software-tools/gcm-ky
This document explains how to set up, run, and understand the MMPBSA pipeline, which simulates and analyzes the interactions between proteins and ligands, such as the FAB-PFOA complex, using the Molecular Mechanics Poisson-Boltzmann Surface Area method. It also covers performing mutagenesis with ChimeraX using rotamers and how to source and run the pipeline.
The script Simulation.sh automates a full molecular dynamics (MD) simulation pipeline, which uses an MPBSA analysis to calculate the binding affinity and give you a per-residue breakdown of the charge composition. The result from this pipeline can be pasted into an Excel sheet, and a pie graph will show which residues play the most significant role in the binding. This pipeline assumes you have already done the previous steps and obtained a pfoa.mol2 and a pfoa.frcmod.
Before running the pipeline, ensure the following are installed:
To run the pipeline, follow these steps:
Make the Simulation.sh executable:
chmod +x Simulation.sh
Run the script with the required arguments:
-r receptor.pdb: Path to the receptor PDB file.
-l ligand.pdb: Path to the ligand PDB file.
-c complex.pdb: Path to the complex PDB file (protein-ligand complex).
-n output_directory: Directory where output files will be stored.
-j num_cores: Number of cores to use for MMPBSA calculations.
./Simulation.sh -r protein.pdb -l ligand.pdb -c complex.pdb -n fab_pfoa_simulation -j 8
This creates an output directory, prepares input files, runs the MD simulation, and performs binding energy analysis.
The pipeline uses several key .in files for different stages of the simulation. These files control the simulation parameters, so understanding what each parameter does is critical.
min.in (Minimization)
&cntrl
imin=1, maxcyc=1000, ncyc=500,
cut=10.0, ntb=1,
ntc=2, ntf=2,
ntpr=100,
ntr=1, restraintmask=':1-376', restraint_wt=2.0
imin=1: Runs minimization (0 would run MD).
maxcyc=1000: Maximum number of minimization cycles.
ncyc=500: The first 500 cycles are steepest descent; the remaining are conjugate gradient.
cut=10.0: Cutoff distance for nonbonded interactions (10 Å).
ntb=1: Constant volume periodic boundary conditions (no pressure coupling).
ntc=2, ntf=2: Bonds involving hydrogen atoms are constrained with SHAKE.
npr=100: Print the output every 100 steps.
ntr=1: Apply harmonic restraints.
restraintmask=':1-376': Apply restraints to residues 1 to 376.
restraint_wt=2.0: Restraint weight of 2 kcal/mol/Ų.
heat.in (Heating)
&cntrl
imin=0, irest=0, ntx=1,
nstlim=25000, dt=0.002,
ntc=2, ntf=2,
cut=10.0, ntb=1,
ntpr=500, ntwx=500,
ntt=3, gamma_ln=2.0,
tempi=0.0, temp0=300.0, ig=-1,
ntr=1, restraintmask=':1-376', restraint_wt=2.0,
nmropt=1
/
&wt TYPE='TEMP0', istep1=0, istep2=25000,
value1=0.1, value2=300.0,
/
&wt TYPE='END'
/
imin=0: No minimization; we are running MD.
irest=0, ntx=1: No restart; start from initial coordinates.
nstlim=25000, dt=0.002: 25,000 steps with a 2 fs timestep.
ntt=3, gamma_ln=2.0: Langevin thermostat with a collision frequency of 2 ps⁻¹.
temp0=300.0: Target temperature is 300 K.
ntr=1, restraintmask=':1-376': Restraints on residues 1 to 376.
Temperature ramping: Temperature is ramped from 0.1 K to 300 K over 25,000 steps.
density.in (Density Equilibration)
&cntrl
imin=0, irest=1, ntx=5,
nstlim=25000, dt=0.002,
ntc=2, ntf=2,
cut=10.0, ntb=2, ntp=1, taup=1.0,
ntpr=500, ntwx=500,
ntt=3, gamma_ln=2.0,
temp0=300.0, ig=-1,ntr=1, restraintmask=':1-376', restraint_wt=2.0,
/
ntb=2: Constant pressure periodic boundary conditions.
ntp=1: Isotropic pressure scaling.
taup=1.0: Pressure relaxation time of 1 ps.
Other conditions similar to Heating.
equil.in (Equilibration)
&cntrl
imin=0, irest=1, ntx=5,
nstlim=250000, dt=0.002,
ntc=2, ntf=2,
cut=10.0, ntb=2, ntp=1, taup=2.0,
ntpr=1000, ntwx=1000,
ntt=3, gamma_ln=2.0,
temp0=300.0, ig=-1,
/
This is a longer equilibration step, with 250,000 steps (500 ps).
prod.in (Production)
&cntrl
imin=0, irest=1, ntx=5,
nstlim=10000000, dt=0.002,
ntc=2, ntf=2,
cut=10.0, ntb=2, ntp=1, taup=2.0,
ntpr=5000, ntwx=5000,
ntt=3, gamma_ln=2.0,
temp0=300.0, ig=-1,
/
nstlim=10000000, dt=0.002: Production run with 10 million steps, totaling 20 ns of MD simulation.
mmpbsa.in (MMPBSA Calculation)
&general
startframe=601, endframe=2000, keep_files=2,
/
&pb
istrng=0.100,
/
&decomp
idecomp=1,
/
startframe=601, endframe=2000: Start MMPBSA calculations from frame 601 to 2000.
istrng=0.100: Ionic strength is 0.1 M.
idecomp=1: Perform per-residue decomposition analysis.
Mutating residues in ChimeraX can be done as follows:
Open ChimeraX and load the structure:
open structure.pdb
Select the residue to mutate: In ChimeraX, click on the residue you want to mutate or use a selection command:
select :<residue_number>
Perform the mutation using rotamers:
swapaa <new_residue> :<residue_number> rotamer
For example, to mutate residue 45 to Alanine:
swapaa ala :45 rotamer
Save the mutated structure:
save mutated_structure.pdb
After mutating, the new structure can be run through the same pipeline using the Simulation.sh script.
Multiple simulations can be run on this with a few minor changes, with the main change required being the restraint mask. For example, I have used this pipeline to simulate FAB-GFP and PFOA, mutations of FAB-GFP and PFOA: ILE 308 → ARG, SER 349 → THR, and even for Palmitic Acid (the natural ligand to FAB) and FAB-GFP. However, this step required me to run palmitic acid through antechamber to get a forcefield for it. After each pipeline use, I pasted the resulting data into Excel and made a pie graph. This helps me see which residues contribute most to the ligand binding to the protein.
Created by Edward Kim
Any files used or created by OpenMM or its related tools will be in the software repository for GCM-KY. Please retrieve any input/output files you require from there.
https://gitlab.igem.org/2024/software-tools/gcm-ky
You need to install the Conda dependencies this script uses.
To set up the Conda environment, type the following command. Replace PATHTOPACAKGELISTTXT
with the path to the environment.yml file:
conda env create -f PATHTOPACAKGELISTTXT
To activate the environment:
conda activate testenv
To add executable permissions, run the following commands:
cd path_to_scoresolver
cd wham
chmod +x wham
To verify:
ls -l wham
If you see an error:
chmod: wham: No such file or directory
Compile wham.c
using:
make
--- title: Bind Coefficient Calculation --- graph LR; pdb[("test.pdb")] win["windows"] mt["metafile.txt & cv data"] pmf["pmf.txt"] traj["smd_traj.dcd"] wl["wham_logs.txt"] kd["$$Kd$$"] pdb -- "SMD Pulling" --> traj & win win -- "Umbrella Sampling" --> mt mt -- "WHAM" --> pmf & wl pmf -- "Binding Coeff. Algo" --> kd win -- "Binding Coeff. Algo" --> kd
If the histograms have poor overlap, you will need to use more windows and/or reduce the spring constant.
The windows do not need to be linearly spaced, and they can have different spring constants.
Important terms used:
ξ(r) = the progress of a reaction as function of atomic configuration/coordinates
Note: different atomic configurations r can have the same reaction coordinate due to ξ’s limited scope, especially in reversible reactions/processes. However, this is a necessary sacrifice because we need to reduce complexity and dimension of the problem. Essentially, ξ is a simplification of r.
F = −kBTln Z
In context, F can be expressed in terms of a specific state ξ, so F(ξ) = −kBTln P(ξ) + C
$$ {\displaystyle \delta _{a}(x)={\frac {1}{\left|a\right|{\sqrt {\pi }}}}e^{-(x/a)^{2}}} $$
In the case of this theory, it projects the space of r to the space of ξ. It picks out the relevant values of ξ(r) close to the value ξ in question and concentrates/amplifies those specific values of r.
P(ξ) = ∫P(r)δ(ξ − ξ(r))dr
Z = ∑energies of ith state weighed by boltzmann distribution factor = ∑ie−βEi
Encapsulates the total distribution of energy, which can later be used to normalize probability distributions over boltzmann factors.
$$ P(\textbf{r}) = \frac{\exp(-\beta E(r))}{Z} : \beta = \frac{1}{k_B T} $$
Here higher values of β, or coldness, makes states with lower temperatures more probable than otherwise. While lower values of β, leading to higher temperatures, makes states with higher temperatures more probable than otherwise. Regardles of positive β > 0, lower temperature states are more probable than higher temperature states. Note that λ ≡ β.
$$ P(\xi) = \int P(\textbf{r})\delta(\xi−\xi(\textbf{r}))d\textbf{r} = \frac{1}{Z} \int e^{-\beta E(\textbf{r})}\delta(\xi−\xi(\textbf{r}))d\textbf{r} $$
Which is the integration over all possible degrees of freedom of a reaction coordinate ξ
In all, its basically multiplying all the possible factors that affect the probability of having state ξ and integrating them because again, probability is a statistical quantity that is approximated.
$$ w_i(\xi) = \frac12 k(\xi - \xi_i)^2 $$
It acts a spring force with strength $k $ that restrains the system
simulation to state ξi, the center
of window i. For example, if
the system at ξ is a bout to
get away from ξi, a bias force
will force the system to get back to the window, resisting normal
eneergy landscape activity. If the strength of k is weak, the system may be able to
explore a little more; contrastly, if k too strong, it may not explore
enough of each window, so a balance is required. There are multiple
windows i = 1, 2, 3…n
that each sample sufficient amounts of the free energy landscape. This
is the main premise of umbrella sampling that separates from other MD
simulations.
NOTE: each quantity discussed previously must have biased version
indicated by an apostrophe superscript bias. After computing/simulating the system
with biasing potentials, we subtract the biased results by the biasing
potential and retrive the true energy curve.
$$ E_i^\text{bias}(\textbf{r}) = E_i(\textbf{r}) + w_i((\xi(\textbf{r}))) \quad [8] $$
$$ P_i^\text{bias}(\xi) \propto \int e^{-\beta (E_i^\text{bias}(\textbf{r}))} \delta (\xi - \xi(\textbf{r})) dr \propto P_i(\xi) e^{-\beta w_i(\xi)}, \quad [9] $$
$$ F^\text{bias}(\xi) = -k_B T \ln P_i(\xi) + w_i(\xi) + C_i = F_i(\xi) + w_i(\xi) + C_i. \quad [10] $$
Combining equations [8], [9], [10],
$$ F_i(\xi) = F_i^\text{bias}(\xi)-w_i(\xi) + C_i = -k_B T \text{log}(P_i^\text{bias}(\xi)) - w_i + C_i. $$
Calculating Binding Constant of a Ligand-Receptor System from the Free Energy Curve
Theoretically,
$$K_d = \frac{[L][R]}{[LR]}$$
where [L], [R], [LR] are the cocentrations of the ligand, receptor, and ligand-receptor complexes respectively. However, since this is computationally intractible to calculate, a more convenient equation would be a relationship between binding free energy and the Kd constant.$$ \Delta F^\circ = k_B T\ln K_\mathrm{d} \Rightarrow K_\mathrm{d} = e^{\frac{\Delta F^\circ}{k_B T}} $$
Where 1 M is 1 molar concentration and R is the universal gas constant. We determine ΔG∘ with the PMF graph we have obtanied. In order to do that, we can use this equation:
$$ \Delta G^\circ = -k_B T \ln \frac{\int_{\mathrm{pocket}} e^{-\frac{\Delta F(\xi)}{k_B T}} \, \mathrm{d}\xi}{\int_{\mathrm{bulk}} e^{-\frac{\Delta F(\xi)}{k_B T}} \, \mathrm{d}\xi} = -k_B T \frac{Z^\circ_{\mathrm{pocket}}}{Z^\circ_{\mathrm{bulk}}} $$
So, what we have right now is the F(ξ) curve. Recognizing these as the continous definitions of the partition function, we simplify it down to:
ΔG∘ = Fpocket − Fbulk
The most important parameters are:
Li and Lf: the starting and ending lengths of the complex you are trying to stretch
index1 and index2: the indexes of the two atoms we want to pivot as endpoints and stretch accordingly.
fcpull: the strength of bias
vpulling: the speed of exeucting the pulling loop to streth the two pivot atoms (the simulation will thus adjust the total time of the pulling loop based on Li and Lf as well)
Virtually, Li can be easily determined by looking at the original PDB file through a MD visualizer, and Lf can be any value Lf ≫ Li (by maybe a factor less than 102 ish) if you are stretching.
Additionally, since the simulation is purely discretized, vpulling should be picked at a low value to ensure it captures energy landscape adequetly.
However, the pivot indexes can be a little tricky to determine. But, the best way to start off is by identifying your objective and looking at the starting PDB to get a glipmse at what you’re working with.
Generally Pick the strongest parts of the Complex
1. For proteins: C-alpha atoms are picked because they're usually the backbone atoms that represent the structure very well, so stretching that wont break the system entirely (they are strong enough to stretch).
2. For nucleic acids: Phosphate atoms or specific base atoms (because again, they're the strongest backbones).
3. For small molecules or ligands: Select atoms that are involved in key interactions or are structurally significant.
If you are trying to get the binding affinity of a docked complex (which is what I am trying to do in this case,) I would select the center of mass of the ligand, and the key atom in a resiude in the binding pocket of the receptor.
As stored in the ./gh_scoresolver/tests/pdb/ahl_dock_luxr_1.pdb
PDB file, we have a ligand protein ahl docked into a receptor protein luxr.