diff --git a/static/css/content.css b/static/css/content.css index 0e9ede91a9594523a4633d243ae5ed8ee0fc2fc2..10c7b2e27ad73ad7d0a9b934cbd6191d5d51ecb6 100644 --- a/static/css/content.css +++ b/static/css/content.css @@ -53,30 +53,62 @@ article p { overflow: visible; } + + .cinfo:nth-child(1){ background: url(https://static.igem.wiki/teams/4223/wiki/h1/b/1.png) no-repeat; - background-position:0 0% ; background-size: 30%; + animation:cinfo1 10s infinite linear alternate; } + .cinfo:nth-child(3){ background: url(https://static.igem.wiki/teams/4223/wiki/h1/b/3.png) no-repeat; - background-position:0 90% ; background-size: 30%; + background-position: 420px 560px; + animation:cinfo3 10s infinite linear alternate; } + + .cinfo:nth-child(5){ background: url(https://static.igem.wiki/teams/4223/wiki/h1/b/4.png) no-repeat; - background-position:0 90% ; background-size: 25%; + background-position: 330px 110%; + animation:cinfo5 10s infinite linear alternate; } .cinfo:nth-child(6){ background: url(https://static.igem.wiki/teams/4223/wiki/h1/b/5.png) no-repeat; - background-position:100% 100% ; background-size: 20%; + background-position: 100% 100%; + animation:cinfo6 2s infinite linear alternate; } + +@keyframes cinfo1{ + from{background-position: 285px -150px;}; + to{background-position: 0 0;}; +} + +@keyframes cinfo3{ + 0%{background-position: 0px 400px;}; + 100%{background-position: 420px 560px;}; +} + + + +@keyframes cinfo5{ + from{background-position: 0 90%;}; + to{background-position: 330px 110%;}; +} + +@keyframes cinfo6{ + 0%{background-position: 100% 90%;}; + 100%{background-position: 100% 110%;}; +} + + /* ä½ç½®æ“ä½œæ ‡ç¾ */ .center { position: relative; diff --git a/wiki/pages/model.html b/wiki/pages/model.html index 8a00e71304c960009740d0ba5b43ce436ee20f4d..9739da3753f1c17ee89d4c35dcbb174ad2dc46e0 100644 --- a/wiki/pages/model.html +++ b/wiki/pages/model.html @@ -1,17 +1,26 @@ {% extends "layout.html" %} {% block htitle %}Model{% endblock %} {% block title %}Model{% endblock %} -{% block lead %}Explain your model's assumptions, data, parameters, and results in a way that anyone could understand.{% endblock %} +{% block lead %}Explain your model's assumptions, data, parameters, and results in a way that anyone could understand.{% +endblock %} {% block page_content %} <div class="row display ltext"> <h2>Analysis of physicochemical properties of proteins and preliminary analysis of PNA shielding effect</h2> - <h3 style="text-transform: capitalize;">Overall overview: Before deciding to use cas13a protein and csm6 protein for experiments, we analyzed their physicochemical properties, hydrophilicity, and stability of protein structure. And a preliminary analysis of the shackling effect of PNA was performed.</h3> + <h3 style="text-transform: capitalize;">Overall overview: Before deciding to use cas13a protein and csm6 protein for + experiments, we analyzed their physicochemical properties, hydrophilicity, and stability of protein structure. And a + preliminary analysis of the shackling effect of PNA was performed.</h3> <h3>1.Hydrophilic analysis of proteins</h3> - <p>The amino acid sequence of a protein determines its hydrophilicity/hydrophobicity, the hydrophilicity determines its solubility in water, the hydrophobicity facilitates the protein to fold internally to form secondary structures, further to form structural domains, tertiary structures, etc., and the strong hydrophobicity facilitates the protein to form an a-helix to increase stability. Here we used Proscale's open source program to predict the hydrophobicity of cas13 and csm6 from amino acid composition</p> + <p>The amino acid sequence of a protein determines its hydrophilicity/hydrophobicity, the hydrophilicity determines + its solubility in water, the hydrophobicity facilitates the protein to fold internally to form secondary structures, + further to form structural domains, tertiary structures, etc., and the strong hydrophobicity facilitates the protein + to form an a-helix to increase stability. Here we used Proscale's open source program to predict the hydrophobicity + of cas13 and csm6 from amino acid composition</p> <p>Amino acid hydrophilicity/hydrophobicity score:</p> - <p>Ala: 1.800 Arg: -4.500 Asn: -3.500 Asp: -3.500 Cys: 2.500 Gln: -3.500 Glu: -3.500 Gly: -0.400 His: -3.200 Ile: 4.500 Leu: 3.800 Lys: -3.900 Met: 1.900 Phe: 2.800 Pro: -1.600 Ser: -0.800 Thr: -0.700 Trp: -0.900 Tyr: -1.300 Val: 4.200 : -3.500 : -3.500 : -0.490 </p> + <p>Ala: 1.800 Arg: -4.500 Asn: -3.500 Asp: -3.500 Cys: 2.500 Gln: -3.500 Glu: -3.500 Gly: -0.400 His: -3.200 Ile: + 4.500 Leu: 3.800 Lys: -3.900 Met: 1.900 Phe: 2.800 Pro: -1.600 Ser: -0.800 Thr: -0.700 Trp: -0.900 Tyr: -1.300 Val: + 4.200 : -3.500 : -3.500 : -0.490 </p> <p>The predicted hydrophobicity of Cas13a is as follows:</p> <div class="opic annotation"> <img src="" alt="png"> @@ -24,55 +33,268 @@ </div> <h3>2.Analysis of physicochemical properties of proteins</h3> <h3>Cas13a protein:</h3> - <p>Number of amino acids: 449</p> - <p>Molecular weight: 52595.25</p> - <p>Theoretical pI: 5.56</p> + <p class="l">Number of amino acids: 449</p> + <p class="l">Molecular weight: 52595.25</p> + <p class="l">Theoretical pI: 5.56</p> <b>Amino acid composition: </b> - <p>Ala (A) 16 3.6% Arg (R) 12 2.7%</p> - <p>Asn (N) 42 9.4% Asp (D) 29 6.5%</p> - <p>Cys (C) 1 0.2% Gln (Q) 3 0.7%</p> - <p>Glu (E) 45 10.0% Gly (G) 24 5.3%</p> - <p>His (H) 4 0.9% Ile (I) 59 13.1%</p> - <p>Leu (L) 28 6.2% Lys (K) 55 12.2%</p> - <p>Met (M) 13 2.9% Phe (F) 27 6.0%</p> - <p>Pro (P) 4 0.9% Ser (S) 24 5.3%</p> - <p>Thr (T) 19 4.2% Trp (W) 4 0.9%</p> - <p>Tyr (Y) 24 5.3% Val (V) 16 3.6%</p> - <p>Pyl (O) 0 0.0% Sec (U) 0 0.0%</p> - <b>Total number of negatively charged residues (Asp + Glu): 74Total number of positively charged residues (Arg + Lys): 67</b> + <table class="table table-hover table-light table-bordered"> + <tbody class="table-group-divider"> + <tr> + <td>Ala (A)</td> + <td>16</td> + <td>3.6%</td> + <td>Arg (R)</td> + <td>12</td> + <td>2.7%</td> + </tr> + <tr> + <td>Asn (N)</td> + <td>42</td> + <td>9.4%</td> + <td>Asp (D)</td> + <td>29</td> + <td>6.5%</td> + </tr> + <tr> + <td>Cys (C)</td> + <td>1</td> + <td>0.2%</td> + <td>Gln (Q)</td> + <td>3</td> + <td>0.7%</td> + </tr> + <tr> + <td>Glu (E)</td> + <td>45</td> + <td>10.0%</td> + <td>Gly (G)</td> + <td>24</td> + <td>5.3%</td> + </tr> + <tr> + <td>His (H)</td> + <td>4</td> + <td>0.9%</td> + <td>Ile (I)</td> + <td>59</td> + <td>13.1%</td> + </tr> + <tr> + <td>Leu (L)</td> + <td>28</td> + <td>6.2%</td> + <td>Lys (K)</td> + <td>55</td> + <td>12.2%</td> + </tr> + <tr> + <td>Met (M)</td> + <td>13</td> + <td>2.9%</td> + <td>Phe (F)</td> + <td>27</td> + <td>6.0%</td> + </tr> + <tr> + <td>Pro (P)</td> + <td>4</td> + <td>0.9%</td> + <td>Ser (S)</td> + <td>24</td> + <td>5.3%</td> + </tr> + <tr> + <td>Thr (T)</td> + <td>19</td> + <td>4.2%</td> + <td>Trp (W)</td> + <td>4</td> + <td>0.9%</td> + </tr> + <tr> + <td>Tyr (Y)</td> + <td>24</td> + <td>5.3%</td> + <td>Val (V)</td> + <td>16</td> + <td>3.6%</td> + </tr> + <tr> + <td>Pyl (O)</td> + <td>0</td> + <td>0.0%</td> + <td>Sec (U)</td> + <td>0</td> + <td>0.0%</td> + </tr> + </tbody> + </table> + <b>Total number of negatively charged residues (Asp + Glu): 74Total number of positively charged residues (Arg + Lys): + 67</b> <b>Atomic composition:</b> - <p>Carbon C 2387</p> - <p>Hydrogen H 3725</p> - <p>Nitrogen N 597</p> - <p>Oxygen O 710</p> - <p>Sulfur S 14</p> + <table class="table table-hover table-light table-bordered"> + <tbody class="table-group-divider"> + <tr> + <td>Carbon</td> + <td>C</td> + <td>2387</td> + </tr> + <tr> + <td>Hydrogen</td> + <td>H</td> + <td>3725</td> + </tr> + <tr> + <td>Nitrogen</td> + <td>N</td> + <td>597</td> + </tr> + <tr> + <td>Oxygen</td> + <td>O</td> + <td>710</td> + </tr> + <tr> + <td>Sulfur</td> + <td>S</td> + <td>14</td> + </tr> + </tbody> + </table> <p>Formula: C(2387)H(3725)N(597)O(710)S(14)Total number of atoms: 7433</p> <h3>Csm6 protein:</h3> - <p>Number of amino acids: 422</p> - <p>Molecular weight: 49126.42</p> - <p>Theoretical pI: 6.54</p> - <b>Amino acid composition: </b> - <p>Ala (A) 16 3.8% Arg (R) 17 4.0%</p> - <p>Asn (N) 41 9.7% Asp (D) 23 5.5%</p> - <p>Cys (C) 2 0.5% Gln (Q) 9 2.1%</p> - <p>Glu (E) 35 8.3% Gly (G) 13 3.1%</p> - <p>His (H) 9 2.1% Ile (I) 36 8.5%</p> - <p>Leu (L) 48 11.4% Lys (K) 39 9.2%</p> - <p>Met (M) 10 2.4% Phe (F) 18 4.3%</p> - <p>Pro (P) 16 3.8% Ser (S) 28 6.6%</p> - <p>Thr (T) 16 3.8% Trp (W) 4 0.9%</p> - <p>Tyr (Y) 18 4.3% Val (V) 24 5.7%</p> - <p>Pyl (O) 0 0.0% Sec (U) 0 0.0%</p> - <b>Total number of negatively charged residues (Asp + Glu): 58Total number of positively charged residues (Arg + Lys): 56</b> + <p class="l">Number of amino acids: 422</p> + <p class="l">Molecular weight: 49126.42</p> + <p class="l">Theoretical pI: 6.54</p> + <b class="l">Amino acid composition: </b> + <table class="table table-hover table-light table-bordered"> + <tbody class="table-group-divider"> + <tr> + <td>Ala (A)</td> + <td>16</td> + <td>3.8%</td> + <td>Arg (R)</td> + <td>17</td> + <td>4.0%</td> + </tr> + <tr> + <td>Asn (N)</td> + <td>41</td> + <td>9.7%</td> + <td>Asp (D)</td> + <td>23</td> + <td>5.5%</td> + </tr> + <tr> + <td>Cys (C)</td> + <td>2</td> + <td>0.5%</td> + <td>Gln (Q)</td> + <td>9</td> + <td>2.1%</td> + </tr> + <tr> + <td>Glu (E)</td> + <td>35</td> + <td>8.3%</td> + <td>Gly (G)</td> + <td>13</td> + <td>3.1%</td> + </tr> + <tr> + <td>His (H)</td> + <td>9</td> + <td>2.1%</td> + <td>Ile (I)</td> + <td>36</td> + <td>8.5%</td> + </tr> + <tr> + <td>Leu (L)</td> + <td>48</td> + <td>11.4%</td> + <td>Lys (K)</td> + <td>39</td> + <td>9.2%</td> + </tr> + <tr> + <td>Met (M)</td> + <td>10</td> + <td>2.4%</td> + <td>Phe (F)</td> + <td>18</td> + <td>4.3%</td> + </tr> + <tr> + <td>Pro (P)</td> + <td>16</td> + <td>3.8%</td> + <td>Ser (S)</td> + <td>28</td> + <td>6.6%</td> + </tr> + <tr> + <td>Thr (T)</td> + <td>16</td> + <td>3.8%</td> + <td>Trp (W)</td> + <td>4</td> + <td>0.9%</td> + </tr> + <tr> + <td>Tyr (Y)</td> + <td>18</td> + <td>4.3%</td> + <td>Val (V)</td> + <td>24</td> + <td>5.7%</td> + </tr> + <tr> + <td>Pyl (O)</td> + <td>0</td> + <td>0.0%</td> + <td>Sec (U)</td> + <td>0</td> + <td>0.0%</td> + </tr> + </tbody> + </table> + <b>Total number of negatively charged residues (Asp + Glu): 58Total number of positively charged residues (Arg + Lys): + 56</b> <b>Atomic composition:</b> - <p>Carbon C 2216</p> - <p>Hydrogen H 3502</p> - <p>Nitrogen N 584</p> - <p>Oxygen O 651</p> - <p>Sulfur S 12</p> + <table class="table table-hover table-light table-bordered"> + <tbody class="table-group-divider"> + <tr> + <td>Carbon</td> + <td>C</td> + <td>2216</td> + </tr> + <tr> + <td>Hydrogen</td> + <td>H</td> + <td>3502</td> + </tr> + <tr> + <td>Nitrogen</td> + <td>N</td> + <td>584</td> + </tr> + <tr> + <td>Oxygen</td> + <td>O</td> + <td>651</td> + </tr> + <tr> + <td>Sulfur</td> + <td>S</td> + <td>12</td> + </tr> + </tbody> + </table> <p>Formula: C(2216)H(3502)N(584)O(651)S(12)Total number of atoms: 6965</p> <h3>3.Preliminary fit analysis of the fluorescence signal data measured in the laboratory</h3> - <p>Taking cas13 as an example, we measured in the laboratory the fluorescence signal intensity generated by Crispr/cas13 recognition target at different time and PNA concentration conditions:</p> + <p>Taking cas13 as an example, we measured in the laboratory the fluorescence signal intensity generated by + Crispr/cas13 recognition target at different time and PNA concentration conditions:</p> <div class="opic large"> <img src="" alt="png"> </div> @@ -80,21 +302,44 @@ <div class="opic large"> <img src="" alt="png"> </div> - <p>Because PNA is expensive, we used different substrate concentrations at fixed PNA concentrations as relative concentrations to represent the concentration of PNA. It can be seen that under certain conditions of reaction time and substrate concentration, the fluorescence signal produced with increasing reaction time becomes stronger, while the effect of the fluorescence signal produced with increasing PNA concentration becomes weaker, but not obvious. This indicates that when the concentration of PNA increases, it also gradually produces interference with the nucleic acid to be tested.</p> + <p>Because PNA is expensive, we used different substrate concentrations at fixed PNA concentrations as relative + concentrations to represent the concentration of PNA. It can be seen that under certain conditions of reaction time + and substrate concentration, the fluorescence signal produced with increasing reaction time becomes stronger, while + the effect of the fluorescence signal produced with increasing PNA concentration becomes weaker, but not obvious. + This indicates that when the concentration of PNA increases, it also gradually produces interference with the + nucleic acid to be tested.</p> <h2 style="text-transform: capitalize;">Kinetic simulation of the entire reaction </h2> - <b class="l">Overall overview: The CRISPR/Cas system is a technique for the modification of target genes by RNA-directed Cas proteins derived from bacterial acquired immunity.CRISPR/Cas bases can be classified into three types, of which, subtype A of the type III CRISPR-Cas system has an effector complex called Csm, which consists of multiple Cas proteins together with crRNA The Csm complex is not only able to cleave target RNA complementary to crRNA, but also the binding of target RNA can activate two new enzymatic activities generated by the Csm complex, namely the cleavage of ssDNA during transcription and the activity of synthesizing cyclic oligoadenylate cOA, which can activate Csm6 as a second messenger to degrade RNA non-specifically. type III CRISPR can generate two types of cOA: cA6 and cA4, which can bind to the CARF structural domain on the cas protein, which in turn activates the HEPN structural domain for non-specific cleavage, cutting off the pre-engineered fluorescent group from the bursting agent and releasing the fluorescent signal.</b> - <h3 class="l" style="text-transform: capitalize;">Because of the time issue and the impact of the epidemic, we only completed the proof-of-concept of Cas13a and Cas14a under the PNA shielding effect in the laboratory, and did not do the validation experiments of Csm6 in tandem with Cas13a. Therefore, in our modeling, we simulated this process of Csm6 reacting in tandem with cas13a and releasing the fluorescence effect, and simulated the effect of shielding the mutant chain with PNA shackles.</h3> + <b class="l">Overall overview: The CRISPR/Cas system is a technique for the modification of target genes by + RNA-directed Cas proteins derived from bacterial acquired immunity.CRISPR/Cas bases can be classified into three + types, of which, subtype A of the type III CRISPR-Cas system has an effector complex called Csm, which consists of + multiple Cas proteins together with crRNA The Csm complex is not only able to cleave target RNA complementary to + crRNA, but also the binding of target RNA can activate two new enzymatic activities generated by the Csm complex, + namely the cleavage of ssDNA during transcription and the activity of synthesizing cyclic oligoadenylate cOA, which + can activate Csm6 as a second messenger to degrade RNA non-specifically. type III CRISPR can generate two types of + cOA: cA6 and cA4, which can bind to the CARF structural domain on the cas protein, which in turn activates the HEPN + structural domain for non-specific cleavage, cutting off the pre-engineered fluorescent group from the bursting + agent and releasing the fluorescent signal.</b> + <h3 class="l" style="text-transform: capitalize;">Because of the time issue and the impact of the epidemic, we only + completed the proof-of-concept of Cas13a and Cas14a under the PNA shielding effect in the laboratory, and did not do + the validation experiments of Csm6 in tandem with Cas13a. Therefore, in our modeling, we simulated this process of + Csm6 reacting in tandem with cas13a and releasing the fluorescence effect, and simulated the effect of shielding the + mutant chain with PNA shackles.</h3> <div class="opic annotation"> <img src="" alt="png"> <p>Reaction system diagram</p> </div> <h3>1.Shielding of mismatched-strand RNA by shackled PNA</h3> - <p>In the process of crispr/cas detection, the cleavage of target by csm6 can be misidentified due to misidentification with similar targets, and we designed the shackled PNA for shielding the mismatched chain to solve this problem. Here we briefly simulated the kinetics of PNA binding to the target and mismatched chains after entering the reaction system.</p> + <p>In the process of crispr/cas detection, the cleavage of target by csm6 can be misidentified due to + misidentification with similar targets, and we designed the shackled PNA for shielding the mismatched chain to solve + this problem. Here we briefly simulated the kinetics of PNA binding to the target and mismatched chains after + entering the reaction system.</p> <div class="opic large"> <img src="" alt="png"> </div> - <h3>2.Csm6 recognizes RNA and then contacts the synthesis of cyclic tetra- or hexa-adenylate, releasing cA4 and cA6 to activate the CARF structural domain</h3> - <p>In this process, we assumed that Csm6 is not activated as Csm6off and released cA4 and cA6 as Csm6on when activated, and the reaction equation is as follows:</p> + <h3>2.Csm6 recognizes RNA and then contacts the synthesis of cyclic tetra- or hexa-adenylate, releasing cA4 and cA6 to + activate the CARF structural domain</h3> + <p>In this process, we assumed that Csm6 is not activated as Csm6off and released cA4 and cA6 as Csm6on when + activated, and the reaction equation is as follows:</p> <div class="opic large"> <img src="" alt="png"> </div> @@ -108,12 +353,15 @@ <div class="opic large"> <img src="" alt="png"> </div> - <h3>3.Csm6 triggers the synthesis of cAO, which binds to the CRISPR-associated CARF structural domain and activates the ribonuclease activity of the HEPN structural domain to cleave the pre-designed fluorescent group-fluorescent burster linker and release the fluorescent signal</h3> + <h3>3.Csm6 triggers the synthesis of cAO, which binds to the CRISPR-associated CARF structural domain and activates + the ribonuclease activity of the HEPN structural domain to cleave the pre-designed fluorescent group-fluorescent + burster linker and release the fluorescent signal</h3> <p>Proposed by Michaelis, the classical enzymatic reaction:</p> <div class="opic large"> <img src="" alt="png"> </div> - <p>where X is the substrate, E is the catalase, C is the reaction intermediate, P is the reaction product, and Kcat is the rate constant for the conversion of C to free E and the product P.</p> + <p>where X is the substrate, E is the catalase, C is the reaction intermediate, P is the reaction product, and Kcat is + the rate constant for the conversion of C to free E and the product P.</p> <p>So we get :</p> <div class="opic large"> <img src="" alt="png"> @@ -129,7 +377,11 @@ </div> <h3>results:</h3> <b>Adding PNA to the reaction system, the concentration of PNA, targetRNA, and mutateRNA in the solution changes</b> - <p>It can be seen that as time changes, the final interfering term mutateRNA is gradually shielded by PNA, while targetRNA is partially depleted, and PNA is depleted the most. This is because PNA will bind to target RNA, mismatch RNA and also to itself at the same time. </p><b>And it can be seen from the graph that the degree of binding mutateRNA>targetRNA>PNA. targetRNA will be retained overwhelmingly and go to the next stage of the reaction together with the few mutateRNAs.</b> + <p>It can be seen that as time changes, the final interfering term mutateRNA is gradually shielded by PNA, while + targetRNA is partially depleted, and PNA is depleted the most. This is because PNA will bind to target RNA, mismatch + RNA and also to itself at the same time. </p><b>And it can be seen from the graph that the degree of binding + mutateRNA>targetRNA>PNA. targetRNA will be retained overwhelmingly and go to the next stage of the reaction together + with the few mutateRNAs.</b> <div class="opic large"> <img src="" alt="png"> </div> @@ -137,22 +389,46 @@ <div class="opic large"> <img src="" alt="png"> </div> - <b>The cAO binds to the CARF structural domain and activates the HEPN structural domain to non-specifically cleave the designed fluorescent group-fluorescent burster linker and release the fluorescent signal</b> + <b>The cAO binds to the CARF structural domain and activates the HEPN structural domain to non-specifically cleave the + designed fluorescent group-fluorescent burster linker and release the fluorescent signal</b> <div class="opic large"> <img src="" alt="png"> </div> <h3>Summary</h3> - <p>Here we tried a variety of values of the pre-RNA coefficients, and it can be seen that when n is 1 it does not match the actual situation, and when it is 2 and 3 it matches the real experimental situation. From the predicted results, we can see that the fluorescence signal is increasing with time by a convex function curve with gradually decreasing growth rate. The whole reaction system will still have a part of single base mismatch strand that cannot be completely shielded and recognized by Csm6 after adding the shackle PNA, which interferes with the detection. This kinetic modeling helps us to better understand the reaction process of detection when Csm6-cas13a is in tandem.</p> - <h2 style="text-transform: capitalize;">Solving chain temperature prediction model based on nearest neighbor method model and DNN neural network</h2> - <b>Overall overview: NATer - Nucleic acid tracker achieves accuracy specific to a single base by blocking the mismatched strand with PNA, but unlike using DNA as a yoke, the backbone of PNA is peptide nucleic acid, 1. When it binds complementarily to the nucleic acid strand, the unlinking temperature will be relatively DNA/DNA binding is greatly elevated; 2. When there are base mismatches present, one base pair mismatch can lower the Tm value by 8-20°C (15°°C on average), which is much higher than the lowered temperature for DNA/DNA binding. Therefore, we also need a modeling approach to simulate the prediction of the optimal unstranding temperature when shielding with PNA. In our model, you can see our method to achieve it.</b> + <p>Here we tried a variety of values of the pre-RNA coefficients, and it can be seen that when n is 1 it does not + match the actual situation, and when it is 2 and 3 it matches the real experimental situation. From the predicted + results, we can see that the fluorescence signal is increasing with time by a convex function curve with gradually + decreasing growth rate. The whole reaction system will still have a part of single base mismatch strand that cannot + be completely shielded and recognized by Csm6 after adding the shackle PNA, which interferes with the detection. + This kinetic modeling helps us to better understand the reaction process of detection when Csm6-cas13a is in tandem. + </p> + <h2 style="text-transform: capitalize;">Solving chain temperature prediction model based on nearest neighbor method + model and DNN neural network</h2> + <b>Overall overview: NATer - Nucleic acid tracker achieves accuracy specific to a single base by blocking the + mismatched strand with PNA, but unlike using DNA as a yoke, the backbone of PNA is peptide nucleic acid, 1. When it + binds complementarily to the nucleic acid strand, the unlinking temperature will be relatively DNA/DNA binding is + greatly elevated; 2. When there are base mismatches present, one base pair mismatch can lower the Tm value by 8-20°C + (15°°C on average), which is much higher than the lowered temperature for DNA/DNA binding. Therefore, we also need a + modeling approach to simulate the prediction of the optimal unstranding temperature when shielding with PNA. In our + model, you can see our method to achieve it.</b> <h3>1.Nearest Neighbor Method Model</h3> - <p>Currently, the standard enthalpy change ΔH° and the entropy change AS of the hybridization reaction of DNA molecules are calculated using the Nearest Neighbor Mdel model proposed by Borer et al. in 1974. The main idea of Nearest Neigh-bor Mdel is that the calculation of the standard enthalpy change ΔH° and entropy change ΔS° of the hybridization reaction of DNA molecules is transformed into the cumulative sum of the standard brackish change and standard entropy change of the 10 dimers (Duplex) formed by the four bases {A,G,C,T} of the DNA molecule, plus the effect of the starting base pair of GC or AT, the ending base pair of TA and the sequence symmetry of the DNA molecule. The following table shows the effect of the Santalucucleus The following table shows the improved duplex parameters given by Santalucia et al. in 1996.</p> + <p>Currently, the standard enthalpy change ΔH° and the entropy change AS of the hybridization reaction of DNA + molecules are calculated using the Nearest Neighbor Mdel model proposed by Borer et al. in 1974. The main idea of + Nearest Neigh-bor Mdel is that the calculation of the standard enthalpy change ΔH° and entropy change ΔS° of the + hybridization reaction of DNA molecules is transformed into the cumulative sum of the standard brackish change and + standard entropy change of the 10 dimers (Duplex) formed by the four bases {A,G,C,T} of the DNA molecule, plus the + effect of the starting base pair of GC or AT, the ending base pair of TA and the sequence symmetry of the DNA + molecule. The following table shows the effect of the Santalucucleus The following table shows the improved duplex + parameters given by Santalucia et al. in 1996.</p> <div class="opic large"> <img src="" alt="png"> </div> <h3>2.Validation analysis of DEA-SBM super-efficiency model for the nearest neighbor method</h3> - <p>We used a non-radial SBM model based on the DEA model to obtain the desired efficiency, using the real unchaining temperature measured in the laboratory as the dependent variable and the number of CG/GG/GA nearest-neighbor dimers as the independent variable, and the model was constructed as follows:</p> - <p>Assume that the double-stranded nucleic acid has n decision units, each including m inputs, s1 desired outputs and s2 non-desired outputs, which can be represented by vectors, respectively, as</p> + <p>We used a non-radial SBM model based on the DEA model to obtain the desired efficiency, using the real unchaining + temperature measured in the laboratory as the dependent variable and the number of CG/GG/GA nearest-neighbor dimers + as the independent variable, and the model was constructed as follows:</p> + <p>Assume that the double-stranded nucleic acid has n decision units, each including m inputs, s1 desired outputs and + s2 non-desired outputs, which can be represented by vectors, respectively, as</p> <div class="opic large"> <img src="" alt="png"> </div> @@ -173,47 +449,86 @@ <b>It can be seen that the efficiency/average value is 0.987,indicating that the prediction is obtained very well using the nearest neighbor method model, and the prediction of the unchaining temperature is performed at a very high efficiency.</b> <h3>3.Neural network prediction of nucleic acid unlinking temperature based on the nearest neighbor method model</h3> <h4>1 Fully Connected Neural Network Form establishment</h4> - <p>As shown in Figure 1, we construct a four-layer Fully Connected Neural Network network with 10 neural units per hidden layer,</p> + <p>As shown in Figure 1, we construct a four-layer Fully Connected Neural Network network with 10 neural units per + hidden layer,</p> <div class="opic annotation"> <img src="" alt="png"> <p>Figure 1 Fully Connected Neural Network Sketch Figure</p> </div> <h4>2 Activation function-mish</h4> - <p>Since our focus is on the relationship between base sequences and temperature coefficients, and since the cleavage temperature of base pairs is not deterministic, and the probability of deconvolution increases with increasing temperature, but there is still a small probability that some base pairs will deconvolute before reaching a specific temperature, we cannot use the most conventional ReLU function for activation because it will produce a dead point before the 0 point, which does not correspond to the actual situation we expect from the model, and use the Mish function for activation</p><b>:f(x)= x・tanh(Ï‚(x)) </b> + <p>Since our focus is on the relationship between base sequences and temperature coefficients, and since the cleavage + temperature of base pairs is not deterministic, and the probability of deconvolution increases with increasing + temperature, but there is still a small probability that some base pairs will deconvolute before reaching a specific + temperature, we cannot use the most conventional ReLU function for activation because it will produce a dead point + before the 0 point, which does not correspond to the actual situation we expect from the model, and use the Mish + function for activation</p><b>:f(x)= x・tanh(Ï‚(x)) </b> <p>where, Ï‚(x) = ln(1+e^x), is a softmax activation function.</p> <div class="opic annotation"> <img src="" alt="png"> <p>Figure 2 Function Mish Sketch figure</p> </div> - <p>We consider its low cost and its various properties (e.g., smooth and non-monotonic properties, unbounded above and bounded below) to improve its performance compared to other commonly used features such as ReLU (rectified linear unit). ) The properties of Mish are described in detail below:</p> - <p>1. Unbounded above and unbounded below: Unbounded above is an ideal property for any activation function because it avoids saturation, which can cause a sharp slowdown in training. Thus, speeding up the training process. The bounded below property helps to achieve a strong regularization effect (fitting the model correctly). (This property of Mish is similar to that of ReLU and Swish in the range [≈ 0.31, ∞)).</p> - <p>2. Non-monotonic function: This property helps to retain small negative values and thus stabilize the network gradient flow. The most commonly used activation functions, such as ReLU [f(x) = max(0, x)] and Leaky ReLU [f(x) = max(0, x), 1] cannot retain negative values because their differentiation is 0 , so most neurons will not update.</p> - <p>3. Infinite continuity and smoothness: Mish as a smoothing function improves the results because it is better in terms of both generalization and effective optimization of the results. In the figure, it can be seen that the smoothness of the landscape changes dramatically between ReLU and Mish with random initialization of the neural network. However, in the case of Swish and Mish, the landscapes are roughly similar.</p> - <p>4. High computational cost but better performance: it is expensive compared to ReLU, but in deep neural networks, the results are better compared to ReLU.</p> + <p>We consider its low cost and its various properties (e.g., smooth and non-monotonic properties, unbounded above and + bounded below) to improve its performance compared to other commonly used features such as ReLU (rectified linear + unit). ) The properties of Mish are described in detail below:</p> + <p>1. Unbounded above and unbounded below: Unbounded above is an ideal property for any activation function because it + avoids saturation, which can cause a sharp slowdown in training. Thus, speeding up the training process. The bounded + below property helps to achieve a strong regularization effect (fitting the model correctly). (This property of Mish + is similar to that of ReLU and Swish in the range [≈ 0.31, ∞)).</p> + <p>2. Non-monotonic function: This property helps to retain small negative values and thus stabilize the network + gradient flow. The most commonly used activation functions, such as ReLU [f(x) = max(0, x)] and Leaky ReLU [f(x) = + max(0, x), 1] cannot retain negative values because their differentiation is 0 , so most neurons will not update. + </p> + <p>3. Infinite continuity and smoothness: Mish as a smoothing function improves the results because it is better in + terms of both generalization and effective optimization of the results. In the figure, it can be seen that the + smoothness of the landscape changes dramatically between ReLU and Mish with random initialization of the neural + network. However, in the case of Swish and Mish, the landscapes are roughly similar.</p> + <p>4. High computational cost but better performance: it is expensive compared to ReLU, but in deep neural networks, + the results are better compared to ReLU.</p> <div class="opic annotation"> <img src="" alt="png"> <p>Figure3 ReLUã€Mashã€swish Calculation iteration comparison chart</p> </div> <p>Coding reference for python based implementation with pytorch as the architecture:</p> - <p>import torch</p> - <p>import torch.nn.functional as fun</p> - <p>def swish(x,beta=1):</p> - <p> return x*torch.nn.Sigmoid()(x*beta)</p> - <p>def mish(x):</p> - <p> return x*(torch.tanh(F.softplus(x)))</p> - <p>class Mish(nn.Module):</p> - <p> def __init__(self):</p> - <p> super().__init__()</p> - <p> def forward(self,x):</p> - <p> return x*(torch.tanh(fun.softplus(x)))</p> + <code> + import torch + import torch.nn.functional as fun + + def swish(x, beta=1): + return x*torch.nn.Sigmoid()(x*beta) + def mish(x): + return x*(torch.tanh(F.softplus(x))) + + class Mish(nn.Module): + def __init__(self): + super().__init__() + def forward(self, x): + return x*(torch.tanh(fun.softplus(x))) + </code> <h4>3 loss function- Huber Loss</h4> - <p>After comparing the mean square error, mean absolute error, and Huber Loss (smoothed absolute loss) with the squared error loss, the variance of the first two is high, and since we are concerned with the output of temperature coefficient, which is known from thermodynamics that its tracing point changes more slowly and not abruptly, we choose Huber Loss. Huber loss is less sensitive to outliers in the data. It is also differentiable at the value of 0. It is essentially an absolute value that becomes squared when the error is small. How large the error makes its squared value depends on a hyperparameter δ, which can be adjusted. </p><b>When δ ~ 0, the Huber loss tends to MAE; when δ ~ ∞, the Huber loss tends to MSE.</b> - <p>The choice of δ is critical, as it determines how you view the outliers. Residuals larger than δ are minimized with L1 (which is less sensitive to very large outliers), while residuals smaller than δ are "properly" minimized with L2. One of the big problems with using MAE for training neural networks is that it always has a large gradient, which can lead to missing minima at the end of training the model using gradient descent. With MSE, the gradient gradually decreases as the loss value approaches its minimum, making it more accurate. In these cases, the Huber loss function can be really helpful because it reduces the gradient around the minimum. And it is more robust to outliers than MSE. Thus, it has the advantages of both loss functions, MSE and MAE. However, the Huber loss function also has a problem that we may need to train the hyperparameter δ, and the process requires constant iterations. Since the Tm we compute has a certain predicted mean value, solid we train δ that will edit more in favor of this mean value.</p> + <p>After comparing the mean square error, mean absolute error, and Huber Loss (smoothed absolute loss) with the + squared error loss, the variance of the first two is high, and since we are concerned with the output of temperature + coefficient, which is known from thermodynamics that its tracing point changes more slowly and not abruptly, we + choose Huber Loss. Huber loss is less sensitive to outliers in the data. It is also differentiable at the value of + 0. It is essentially an absolute value that becomes squared when the error is small. How large the error makes its + squared value depends on a hyperparameter δ, which can be adjusted. </p><b>When δ ~ 0, the Huber loss tends to MAE; + when δ ~ ∞, the Huber loss tends to MSE.</b> + <p>The choice of δ is critical, as it determines how you view the outliers. Residuals larger than δ are minimized with + L1 (which is less sensitive to very large outliers), while residuals smaller than δ are "properly" minimized with + L2. One of the big problems with using MAE for training neural networks is that it always has a large gradient, + which can lead to missing minima at the end of training the model using gradient descent. With MSE, the gradient + gradually decreases as the loss value approaches its minimum, making it more accurate. In these cases, the Huber + loss function can be really helpful because it reduces the gradient around the minimum. And it is more robust to + outliers than MSE. Thus, it has the advantages of both loss functions, MSE and MAE. However, the Huber loss function + also has a problem that we may need to train the hyperparameter δ, and the process requires constant iterations. + Since the Tm we compute has a certain predicted mean value, solid we train δ that will edit more in favor of this + mean value.</p> <p>The python-based code for this function, using numpy as the architecture, is</p> - <p>Import numpy as np</p> - <p>def huber(true, pred, delta):</p> - <p>loss = np.where(np.abs(true-pred) < delta , 0.5*((true-pred)**2), delta*np.abs(true - pred) - 0.5*(delta**2))</p> - <p> return np.sum(loss)</p> + <code> + import numpy as np + def huber(true, pred, delta): + loss = np.where(np.abs(true-pred) < delta , 0.5*((true-pred)**2), delta*np.abs(true - pred) - 0.5*(delta**2)) + return np.sum(loss) + </code> <h4>4 Regularization method-Dropout</h4> <p>One of the main challenges in training a model in deep machine learning is co-adaptation. This means that the neurons are interdependent. They have a considerable influence on each other and are not independent enough relative to their inputs. We also often find situations where some neurons have a more important predictive power than others. In other words, we can be overly dependent on the output of individual neurons.</p> <p>These effects must be avoided and the weights must have a certain distribution to prevent overfitting. The co-adaptation and high predictive power of certain neurons can be tuned by different regularization methods. One of the most commonly used is Dropout. the application of the Dropout method is schematically illustrated as follows:</p> @@ -221,54 +536,64 @@ <img src="" alt="png"> </div> <p class="c">Figure 4 Dropout application scenario</p> - <p>To prevent overfitting in the training phase, neurons are randomly removed. In a dense (or fully connected) network, for each layer, we give a dropout probability p. In each iteration, each neuron is removed with probability p. The paper by Hinton et al. suggests a dropout probability of "p=0.2" for the input layer and The dropout probability of the hidden layer is "p=0.5". Obviously, we are interested in the output layer, which is our prediction. So we only use dropout in the hidden layer and do not apply dropout in the input layer.</p> + <p>To prevent overfitting in the training phase, neurons are randomly removed. In a dense (or fully connected) + network, for each layer, we give a dropout probability p. In each iteration, each neuron is removed with probability + p. The paper by Hinton et al. suggests a dropout probability of "p=0.2" for the input layer and The dropout + probability of the hidden layer is "p=0.5". Obviously, we are interested in the output layer, which is our + prediction. So we only use dropout in the hidden layer and do not apply dropout in the input layer.</p> <p>Based on python to pytouch architecture to implement Dropout method code as follows:</p> - <p>import torch</p> - <p>from torch import nn</p> - <p>from d2l import torch as d2l</p> + <code> + import torch + from torch import nn + from d2l import torch as d2l + def dropout_layer(X, dropout): + assert (0 <= dropout <= 1) + if dropout == 1: + return torch.zeros_like(X) + if dropout == 0: + return X + + torch.rand() + mask = (torch.rand(X.shape) > dropout).float() + return mask * X / (1.0 - dropout) - <p>def dropout_layer(X, dropout):</p> - <p> assert 0 <= dropout <= 1</p> - <p> if dropout == 1:</p> - <p> return torch.zeros_like(X)</p> - <p> if dropout == 0:</p> - <p> return X</p> - <p> torch.rand()</p> - <p> mask = (torch.rand(X.shape) > dropout).float()</p> - <p> return mask * X / (1.0 - dropout)</p> - <p>dropout1, dropout2,dropout3,dropout4 = 0.2, 0.5,0.6,0.8 #test</p> - <p>class Net(nn.Module):</p> - <p> def __init__(self, num_inputs, num_outputs, num_hiddens1, num_hiddens2,</p> - <p> is_training = True):</p> - <p> super(Net, self).__init__()</p> - <p> self.num_inputs = num_inputs</p> - <p> self.training = is_training</p> - <p> self.lin1 = nn.Linear(num_inputs, num_hiddens1)</p> - <p> self.lin2 = nn.Linear(num_hiddens1, num_hiddens2)</p> - <p> self.lin3 = nn.Linear(num_hiddens2, num_hiddens3)</p> - <p> self.lin4 = nn.Linear(num_hiddens3, num_hiddens4)</p> - <p> self.lin5 = nn.Linear(num_hiddens4, num_outputs)</p> + dropout1, dropout2,dropout3,dropout4 = 0.2, 0.5,0.6,0.8 #test - <p> self.mish = nn.Mish()</p> + class Net(nn.Module): + def __init__(self, num_inputs, num_outputs, num_hiddens1, num_hiddens2, + is_training = True): + super(Net, self).__init__() + self.num_inputs = num_inputs + self.training = is_training + self.lin1 = nn.Linear(num_inputs, num_hiddens1) + self.lin2 = nn.Linear(num_hiddens1, num_hiddens2) + self.lin3 = nn.Linear(num_hiddens2, num_hiddens3) + self.lin4 = nn.Linear(num_hiddens3, num_hiddens4) + self.lin5 = nn.Linear(num_hiddens4, num_outputs) + self.mish = nn.Mish() - <p> def forward(self, X):</p> - <p> H1 = self.mish(self.lin1(X.reshape((-1, self.num_inputs))))</p> - <p> if self.training == True:</p> - <p> H1 = dropout_layer(H1, dropout1)</p> - <p> H2 = self.mish(self.lin2(H1))</p> - <p> if self.training == True:</p> - <p> H2 = dropout_layer(H2, dropout2)</p> - <p>H3 = self.mish(self.lin2(H2))</p> - <p> if self.training == True:</p> - <p> H3 = dropout_layer(H3, dropout3)</p> - <p>H4 = self.mish(self.lin2(H3))</p> - <p> if self.training == True:</p> - <p> H4 = dropout_layer(H4, dropout2)</p> - <p> out = self.lin3(H4)</p> - <p> return out</p> - <p>net = Net(num_inputs, num_outputs, num_hiddens1, num_hiddens2, num_hiddens3,num_hiddens4)</p> + def forward(self, X): + H1 = self.mish(self.lin1(X.reshape((-1, self.num_inputs)))) + if self.training == True: + H1 = dropout_layer(H1, dropout1) + H2 = self.mish(self.lin2(H1)) + if self.training == True: + H2 = dropout_layer(H2, dropout2) + H3 = self.mish(self.lin2(H2)) + if self.training == True: + H3 = dropout_layer(H3, dropout3) + H4 = self.mish(self.lin2(H3)) + if self.training == True: + H4 = dropout_layer(H4, dropout2) + out = self.lin3(H4) + return out + + net = Net(num_inputs, num_outputs, num_hiddens1, num_hiddens2, num_hiddens3,num_hiddens4) + </code> <h4>5 Training results</h4> - <p>As Form 1 our training set using a total of 72 sets of data established an R-value of 0.98483, which can basically meet the prediction needs and established a standard deviation of 2.34277, possessing a high degree of robustness in Tm judgment.</p> + <p>As Form 1 our training set using a total of 72 sets of data established an R-value of 0.98483, which can basically + meet the prediction needs and established a standard deviation of 2.34277, possessing a high degree of robustness in + Tm judgment.</p> <div class="opic annotation"> <img src="" alt="png"> <p>Form 1 Statistics</p> @@ -279,7 +604,9 @@ <img src="" alt="png"> </div> <b class="c">Figure 5 Fitted Plot & Regression Plot</b> - <p>The residual variance of the network we built is relatively scattered as shown in Figure 6, and a little high variance residuals in the low location region may have an impact on the low eaves data, which still needs to be improved in the future.</p> + <p>The residual variance of the network we built is relatively scattered as shown in Figure 6, and a little high + variance residuals in the low location region may have an impact on the low eaves data, which still needs to be + improved in the future.</p> <div class="opic annotation"> <img src="" alt="png"> <p> Figure 6 Residual Plot</p> @@ -301,4 +628,4 @@ -{% endblock %} +{% endblock %} \ No newline at end of file