MD Simulation


Abstract


Using the Molecular Dynamics (MD) simulation on the Cascade-crRNA-DNA complex and the Cas12a-crRNA-DNA complex, to compare the RMSF and the number of hydrogen bonds formed between protein and crRNA of different combinations. From the comparison, it was predicted that the stability at the equilibrium is at the highest when the DNA recognized by the Cas12a is the ds-template3. We assumed that the more stable the complex, the higher the CRISPR-Cas's ability to detect, and thus concluded that 3step-SDA is the most desirable amplification method. This result aligns with the conclusion regarding the preferable amplification method as discussed in Dry Lab_Model_Amplification_comparison.

Purpose


We plan to use either CRISPR-Cas3 or CRISPR-Cas12a in the final stage of detecting the target miRNA. Additionally, there are several options for the crRNA, and there is flexibility in the number of multistep-SDA stages that can be inserted before connecting to CRISPR-Cas. Here, we will consider the most desirable options by varying the crRNA and the complementary DNA to the crRNA for both CRISPR-Cas3 and CRISPR-Cas12a.

Methods


The stability of the CRISPR-Cas3 complex will be evaluated using Molecular Dynamics (MD) simulation for the complex of crRNA, DNA, and Cascade, while the stability of the CRISPR-Cas12a complex will be evaluated for the crRNA-DNA-Cas12a complex. Table 1. shows the combinations of Protein, crRNA, and DNA used in the simulations. After connecting n step SDA and dsDNA elongation, we call the dsDNA recognized by CRISPR-Cas as ds-template.

\[ \begin{array}{|c|c|c|} \hline \textbf{Protein} & \textbf{crRNA} & \textbf{DNA} \\ \hline \text{Cascade} & \text{SARS-CoV-2 N2} & \text{Positive Control (PC)} \\ \hline \text{Cascade} & \text{SARS-CoV-2 N2} & \text{ds-template2} \\ \hline \text{Cas12a} & \text{EMX1 crRNA} & \text{ds-template0} \\ \hline \text{Cas12a} & \text{EMX1 crRNA} & \text{ds-template1} \\ \hline \text{Cas12a} & \text{EMX1 crRNA} & \text{ds-template2} \\ \hline \text{Cas12a} & \text{EMX1 crRNA} & \text{ds-template3} \\ \hline \text{Cas12a} & \text{SARS-CoV-2 N2 crRNA} & \text{Positive Control (PC)} \\ \hline \text{Cas12a} & \text{SARS-CoV-2 N2 crRNA} & \text{ds-template0} \\ \hline \text{Cas12a} & \text{SARS-CoV-2 N2 crRNA} & \text{ds-template1} \\ \hline \text{Cas12a} & \text{SARS-CoV-2 N2 crRNA} & \text{ds-template2} \\ \hline \text{Cas12a} & \text{SARS-CoV-2 N2 crRNA} & \text{ds-template3} \\ \hline \end{array} \]

Table 1. crRNA sequences for different proteins and species

The initial structure of MD simulation uses structure prediction from AlphaFold31. The amino acid sequences of Cascade and Cas12a are the same as the protein of PDB ID:5H9F and PDB ID:5ID6, respectively. The base sequences of crRNA and DNA are shown in Table 2. and 3., respectively.

In the MD simulation of Cascade and Cas12a, GROMACS and GENESIS were used, respectively 2, 3, 4, 5, 6, 7, 8, 9. The initial structure does not contain water. After the solvation, ionization, energy minimization, and equilibration with constant pressure and temperature of the protein, a full 2 ns molecular dynamics simulation was conducted.

Sequences

Protein Species Sequence (5' to 3')
Cascade SARS-CoV-2 N2 crRNA GUGUUCCCCGCGCCAGCGGGGAUAAACCGUGCAAUUUGCG
GCCAAUGUUUGUAAUCAGUUC
GUGUUCCCCGCGCCAGCGGGGAUAAACCG
Cas12a EMX1 AAUUUCUACUAAGUGUAGAUAAUGACUAGGGUGGGCAACCA
Cas12a SARS-CoV-2 N2 AAUUUCUACUAAGUGUAGAUGAACGCUGAAGCGCUGGGGG

Table 2. crRNA sequences for different proteins and species

Protein Species Name Sequence (5' to 3')
Cascade SARS-CoV-2 N2 PC AGGAACTAATCAGACAAGGAACTGATTACAAACATTGGCCGCAAATTGCACAATTTGCCC | GGGCAAATTGTGCAATTTGCGGCCAATGTTTGTAATCAGTTCCTTGTCTGATTAGTTCCT
Cascade SARS-CoV-2 N2 ds-template2 TTGAAGGAACTGATTACAAACATTGGCCGCAAATTGCATTCCTGCTGAACTGAGCCACCTCA | TGAGGTGGCTCAGTTCAGCAGGAATGCAATTTGCGGCCAATGTTTGTAATCAGTTCCTTCAA
Cascade SARS-CoV-2 N2 ds-template3 TTGAAGGAACTGATTACAAACATTGGCCGCAAATTGCAGCCCTGTACAATGCTGCTCCTCA | TGAGGAGCAGCATTGTACAGGGCTGCAATTTGCGGCCAATGTTTGTAATCAGTTCCTTCAA
Cas12a EMX1 ds-template0 CTTATCAGACTGATGTTGATTTGTGGTTGCCCACCCTAGTCATTCAATGACTAG | CTAGTCATTGAATGACTAGGGTGGGCAACCACAAATCAACATCAGTCTGATAAG
Cas12a EMX1 ds-template1 ACATTGTCTGCTGGGTTTCTTTGTGGTTGCCCACCCTAGTCATTCAATGACTAG | CTAGTCATTGAATGACTAGGGTGGGCAACCACAAAGAAACCCAGCAGACAATGT
Cas12a EMX1 ds-template2 TGGCTCAGTTCAGCAGGAATTTGTGGTTGCCCACCCTAGTCATTCAATGACTAG | CTAGTCATTGAATGACTAGGGTGGGCAACCACAAATTCCTGCTGAACTGAGCCA
Cas12a EMX1 ds-template2 TGGCTCAGTTCAGCAGGAATTTGTGGTTGCCCACCCTAGTCATTCAATGACTAG | CTAGTCATTGAATGACTAGGGTGGGCAACCACAAATTCCTGCTGAACTGAGCCA
Cas12a EMX1 ds-template3 AGCAGCATTGTACAGGGCTTTTGTGGTTGCCCACCCTAGTCATTCAATGACTAG | CTAGTCATTGAATGACTAGGGTGGGCAACCACAAAAGCCCTGTACAATGCTGCT
Cas12a SARS-CoV-2 N2 PC GCCGCAAATTGCACAATTTGCCCCCAGCGCTTCAGCGTTCTTCGGAATGTCGCAGCTTGG | CCAAGCTGCGACATTCCGAAGAACGCTGAAGCGCTGGGGGCAAATTGTGCAATTTGCGGC
Cas12a SARS-CoV-2 N2 ds-template0 CTTATCAGACTGATGTTGATTTGCCCCCAGCGCTTCAGCGTTCCAATGACTAG | CTAGTCATTGGAACGCTGAAGCGCTGGGGGCAAATCAACATCAGTCTGATAAG
Cas12a SARS-CoV-2 N2 ds-template1 ACATTGTCTGCTGGGTTTCTTTGCCCCCAGCGCTTCAGCGTTCCAATGACTAG | CTAGTCATTGGAACGCTGAAGCGCTGGGGGCAAAGAAACCCAGCAGACAATGT
Cas12a SARS-CoV-2 N2 ds-template2 TGGCTCAGTTCAGCAGGAATTTGCCCCCAGCGCTTCAGCGTTCCAATGACTAG | CTAGTCATTGGAACGCTGAAGCGCTGGGGGCAAATTCCTGCTGAACTGAGCCA
Cas12a SARS-CoV-2 N2 ds-template3 AGCAGCATTGTACAGGGCTTTTGCCCCCAGCGCTTCAGCGTTCCAATGACTAG | CTAGTCATTGGAACGCTGAAGCGCTGGGGGCAAAAGCCCTGTACAATGCTGCT

Table 3. DNA Target and Primer Sequences

In Table 1. and Table 2., complementary binding crRNA and DNA sequences are underlined.

RMSD

Video 1. 2 ns motion of Cas12a-crRNA-DNA complex obtained by molecular dynamics simulation.

Root Mean Square Deviation (RMSD) represents the difference between two structures, the target structure and the initial structure. If the RMSD converges to a constant value, it can be considered that the system has reached equilibrium. After completing the full 2 ns MD simulation, the RMSD is calculated using MDAnalysis 10.

\begin{equation} \begin{aligned} \text{RMSD} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} \left( \mathbf{r}_i(t) - \mathbf{r}_i(0) \right)^2} \notag\\ \end{aligned} \end{equation}

Then, we conducted the MD simulation using the combinations in Table 1, and the obtained RMSD are shown in Figure 1, 2, and 3.

hogehoge

Figure 1.

hogehoge

Figure 2.

hogehoge

Figure 3.

As seen in Figure 1., the Cascade-crRNA-DNA complex does not show a flat line, particularly regarding PC, during the 2 ns simulation, indicating that the complex has not reached equilibrium. In contrast, many of the Cas12a-crRNA-DNA complexes exhibit a flat line within 2 ns, suggesting that they have reached equilibrium.

When analyzing larger structures, the time required for simulation is long. Cascade is a complex of multiple proteins, and due to its significantly larger structure compared to Cas12a, the 2 ns simulation time was insufficient for it to reach equilibrium. In the following analysis, we will focus on the Cas12a complexes, where many combinations show RMSD convergence, i.e. equilibration, within the 2 ns simulation time, to analyze the stability of the complex at equilibrium.

Analysis


RMSF

Root Mean Square Fluctuation (RMSF) calculates the deviation from the average structure and represents fluctuations from the equilibrium state. It allows us to identify fluctuating regions or residues, which can be used to evaluate flexibility and consider functionality. Here, RMSF is calculated to assess the stability of the complex in its equilibrium state. The RMSF of the i-th residue is defined by the following equation. When plotting the RMSF for different DNA sequences, the graph becomes difficult to interpret due to the large amount of noise, causing overlap between the lines. Therefore, a Gaussian filter was applied to remove the noise, making it easier to compare RMSF across different DNA sequences.

\begin{equation} \begin{aligned} \text{RMSF}_i = \sqrt{\frac{1}{T} \sum_{t=1}^{T} \left( \mathbf{r}_i(t) - \langle \mathbf{r}_i \rangle \right)^2} \notag\\ \end{aligned} \end{equation}
hogehoge

Figure 4.

hogehoge

Figure 5.

Number of hydrogen bonds

To see the interactions between the protein and crRNA, we examined the time evolution of the number of hydrogen bonds formed between them 11. The criteria for a hydrogen bonds were defined as \(donor-acceptor\ distance < 3\text{\AA},\ donor-hydrogen-acceptor\ angle > 150^{\circ}\). The donor was considered to be the protein, and the acceptor was the nucleic acid. Since the number of hydrogen bonds also exhibits significant noise, a Gaussian filter was applied to remove the noise, making it easier to compare the number of hydrogen bonds across different DNA sequences.

hogehoge

Figure 6.

hogehoge

Figure 7.

Conclusions


A smaller RMSF value indicates smaller fluctuations from the equilibrium state, meaning the complex is more stable. Additionally, the more hydrogen bonds formed between Cas12a and crRNA, the stronger the interaction between them, leading to increased stability of the complex. Here, we evaluate DNA sequences where the complex stability is higher as superior.

For the combination of Cas12a and EMX1 crRNA, as shown in Figure 4., it was confirmed that the RMSF of DNA ds-template1 and ds-template2 is relatively large. Furthermore, Figure 6. shows that the number of hydrogen bonds is relatively low when the DNA is ds-template0 or ds-template2. These results suggest that the complex stability is highest when the DNA is ds-template3. In our Dry Lab, we predicted that when 3step-SDA is connected with dsDNA elongation, the signal amplification efficiency, as well as the robustness of the amplification efficiency in terms of time, enzyme, and S/N ratio, would be the highest (Dry lab_Model_Amplification_comparison). By conducting MD simulations and demonstrating that the stability of the Cas12a-crRNA-DNA complex is highest when the DNA is ds-template3, we were able to propose from a different approach that linking 3step-SDA with dsDNA elongation is the preferable amplification method.

On the other hand, as shown in Figure 5. and Figure 7., for the combination of Cas12a and SARS-CoV-2 N2 crRNA, there are no clear differences in terms of RMSF or the number of hydrogen bonds among ds-template0, ds-template1, ds-template2, and ds-template3. This indicates that there is no significant difference in complex stability among the four DNA sequences. Therefore, we concluded that the 3step-SDA, predicted to be superior in terms of amplification efficiency and robustness, is also preferable when using SARS-CoV-2 N2 crRNA.

Future


In this discussion, we implicitly assumed that the more stable the equilibrium state of the Cas12a-crRNA-DNA complex, the more favorable it is for DNA detection. However, this assumption lacks sufficient evidence, and further investigation is needed to determine whether it is indeed correct. Additionally, there are residues that exhibit notably high RMSF values (for instance, around residue 600 for ds-template2, and around residue 1100 for both ds-template1 and ds-template2 in Figure 4.). Residues with high RMSF values may be involved in binding interactions. Further research is needed to explore the relationship between the differences in DNA and the residues with relatively high RMSF values, and this could potentially lead to the design of crRNA that reduces RMSF.

Appendix


We create an MD simulation manual to help future iGEM teams run MD simulations smoothly. The manual describes how to use GROMACS 3 and GENESIS 12, 13, which we used.

The manual is as follows,

References


  1. Abramson, J., Adler, J., Dunger, J., Evans, R., Green, T., Pritzel, A., Ronneberger, O., Willmore, L., Ballard, A. J., Bambrick, J., Bodenstein, S. W., Evans, D. A., Hung, C.-C., O’Neill, M., Reiman, D., Tunyasuvunakool, K., Wu, Z., Žemgulytė, A., Arvaniti, E., Beattie, C., … Jumper, J. M. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 630493-500. https://doi.org/10.1038/s41586-024-07487-w

  2. Abraham, M., Alekseenko, A., Basov, V., Bergh, C., Briand, E., Brown, A., Doijade, M., Fiorin, G., Fleischmann, S., Gorelov, S., Gouaillardet, G., Gray, A., Irrgang, M. E., Jalalypour, F., Jordan, J., Kutzner, C., Lemkul, J. A., Lundborg, M., Merz, P., … Lindahl, E. (2024). GROMACS 2024.3 Source code (2024.3). Zenodo. https://doi.org/10.5281/zenodo.13456374

  3. M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E. Lindahl (2015). GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX, 119-25. https://doi.org/10.1016/j.softx.2015.06.001

  4. Páll, S., Abraham, M.J., Kutzner, C., Hess, B., Lindahl, E. (2015). Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS.In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale, 87593-27. https://doi.org/10.1007/978-3-319-15976-8_1

  5. S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R. Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl. (2013). GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit.Bioinformatics, 29845-854. https://doi.org/10.1093/bioinformatics/btt055

  6. B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl. (2008). GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation.J. Chem. Theory Comput., 4(3), 435-447. https://doi.org/10.1021/ct700301q

  7. D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C. Berendsen. (2005). GROMACS: Fast, Flexible and Free.J. Comp. Chem., 26 1701-1719. https://doi.org/10.1002/jcc.20291

  8. E. Lindahl and B. Hess and D. van der Spoel. (2001). GROMACS 3.0: A package for molecular simulation and trajectory analysis.J. Mol. Mod., 7 306-317. https://doi.org/10.1007/s008940100045

  9. H. J. C. Berendsen, D. van der Spoel and R. van Drunen. (1995). GROMACS: A message-passing parallel molecular dynamics implementation.Comp. Phys. Comm., 91 43-56. https://doi.org/10.1016/0010-4655(95)00042-E

  10. MDAnalysis 4.2.4. Calculating root mean square quantities. https://docs.mdanalysis.org/1.1.1/documentation_pages/analysis/rms.html

  11. Smith, P., Ziolek, R. M., Gazzarrini, E., Owen, D. M., & Lorenz, C. D. (2019). On the interaction of hyaluronic acid with synovial fluid lipid membranes.Physical Chemistry Chemical Physics, 19. https://doi.org/10.1039/C9CP01532A

  12. Kobayashi, C., Jung, J., Matsunaga, Y., Mori, T., Ando, T., Tamura, K., ... & Sugita, Y. (2017). GENESIS 1.1: A hybrid‐parallel molecular dynamics simulator with enhanced sampling algorithms on multiple computational platforms. Journal of Computational Chemistry, 38, 2193-2206. http://dx.doi.org/10.1002/jcc.24874

  13. Jung, J., Mori, T., Kobayashi, C., Matsunaga, Y., Yoda, T., Feig, M., & Sugita, Y. (2015). GENESIS: a hybrid‐parallel and multi‐scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations. Wiley Interdisciplinary Reviews: Computational Molecular Science, 5(4), 310-323. https://doi.org/10.1002/wcms.1220