diff --git a/wiki/pages/model.html b/wiki/pages/model.html index a5d9859d1fe8dcceac142bff89dc52f99643a2ac..f62f2612a3d20df420ae361af3c8179e77f1d6e0 100644 --- a/wiki/pages/model.html +++ b/wiki/pages/model.html @@ -92,7 +92,7 @@ </li> <li> <input id="group-5" type="checkbox" hidden /> - <a href="#s8"><span class="fa fa-angle-right"></span> Conclusion and Future Perspectives </a> + <a href="#s8"><span class="fa fa-angle-right"></span> Success and Future prospects</a> </li> <li> <input id="group-5" type="checkbox" hidden /> @@ -197,38 +197,51 @@ expression the sequence is considered.</p> <img src="https://static.igem.wiki/teams/4815/wiki/devider.png"> </div> <div class="wenbenkuang"> -<p>1. <a style="font-weight: bolder">Pre-train</a> refers to the use of pre-trained models that have been trained on a large and diverse -dataset. These pre-trained models can then be -fine-tuned on specific datasets for transfer learning to specific tasks.</p> -<p>2.<a style="font-weight: bolder">Fine-tuning</a> refers to the process of adjusting the crucial parameters of a model to make the -model's output approximate the measured quantities. This is achieved -by quantifying the deviation between the model's output and the measured quantities using a loss -function, and then updating the parameters of the Neural Network using -an optimizer to minimize the loss. By repeating this process, the model's output can, to some -extent, represent the measured quantities, i.e., make -predictions.</p> -<p>3. <a style="font-weight: bolder">Tokenizer</a> A tokenizer, or word segmenter, is used to divide a given sequence into shorter -sequences of length kmer. These shorter sequences are -then converted into numerical values based on a predefined mapping table, facilitating the -extraction of feature values by the computer.</p> -<img src="https://static.igem.wiki/teams/4815/wiki/model/model1.png" width=75% class="imgz"> -<p>4. <a style="font-weight: bolder">Optimizer</a>:The optimizer guides the parameters of the loss function (objective function) in the -correct direction and appropriate magnitude during the -backpropagation process of deep learning, allowing the updated parameters to continuously approach -the global minimum point of the objective function.</p> -<p>5. -<a style="font-weight: bolder">Learning rate</a>: When applying the gradient descent algorithm to optimize the learning rate, a -coefficient called the learning rate α is multiplied to the gradient term -in the weight update rule. If the learning rate is too small, convergence will be slow. On the other -hand, if the learning rate is too large, it can lead to cost -function oscillations and overly rapid iteration, causing the gradient descent algorithm to -potentially overshoot the global minimum point or even diverge. As shown in -the figure below, lower loss corresponds to better results.</p> -<img src="https://static.igem.wiki/teams/4815/wiki/model/model2.png" width="42.5%" class="imgz"> -<p> -6. <a style="font-weight: bolder">Loss function</a> The loss function measures the degree of deviation between the predicted and -actual values. A smaller loss function indicates a better prediction -performance.</p> +<p>PS: click to expand, double-click to collapse</p> +<div class="row frame-wrapper wenbenkuang"> + <div class="frame-box" onclick="handleClick()"> + <div class="frame-title xioabiaoti" style="text-align:center;" title="click to see detail">Pre-train</div> + <div class="frame-content collapsed" title="double click to close"> + <p>It refers to the use of pre-trained models that have been trained on a large and diverse dataset. These pre-trained models can then be fine-tuned on specific datasets for transfer learning to specific tasks.</p> + </div> + </div> + <div class="frame-box" onclick="handleClick()"> + <div class="frame-title xioabiaoti" style="text-align:center;" title="click to see detail">Fine-tuning</div> + <div class="frame-content collapsed" title="double click to close"> + <p>It refers to the process of adjusting the crucial parameters of a model to make the model's output approximate the measured quantities. This is achieved by quantifying the deviation between the model's output and the measured quantities using a loss function, and then updating the parameters of the Neural Network using an optimizer to minimize the loss. By repeating this process, the model's output can, to some extent, represent the measured quantities, i.e., make predictions.</p> + <p></p> + </div> + </div> + <div class="frame-box" onclick="handleClick()"> + <div class="frame-title xioabiaoti" style="text-align:center;" title="click to see detail">Tokenizer</div> + <div class="frame-content collapsed" title="double click to close"> + <p>A tokenizer, or word segmenter, is used to divide a given sequence into shorter sequences of length kmer. These shorter sequences are then converted into numerical values based on a predefined mapping table, facilitating the extraction of feature values by the computer.</p> + <img src="https://static.igem.wiki/teams/4815/wiki/contribution/211.png" width=50% class="imgz"> + </div> + </div> + <div class="frame-box" onclick="handleClick()"> + <div class="frame-title xioabiaoti" style="text-align:center;" title="click to see detail">Optimizer</div> + <div class="frame-content collapsed" title="double click to close"> + <p>The optimizer guides the parameters of the loss function (objective function) in the correct direction and appropriate magnitude during the backpropagation process of deep learning, allowing the updated parameters to continuously approach the global minimum point of the objective function.</p> + <p></p> + </div> + </div> + <div class="frame-box" onclick="handleClick()"> + <div class="frame-title xioabiaoti" style="text-align:center;" title="click to see detail">Learning rate</div> + <div class="frame-content collapsed" title="double click to close"> + <p>When applying the gradient descent algorithm to optimize the learning rate, a coefficient called the learning rate α is multiplied to the gradient term in the weight update rule. If the learning rate is too small, convergence will be slow. On the other hand, if the learning rate is too large, it can lead to cost function oscillations and overly rapid iteration, causing the gradient descent algorithm to potentially overshoot the global minimum point or even diverge. As shown in the figure below, lower loss corresponds to better results.</p> + + </div> + </div> + <div class="frame-box" onclick="handleClick()"> + <div class="frame-title xioabiaoti" style="text-align:center;" title="click to see detail">Loss function</div> + <div class="frame-content collapsed" title="double click to close"> + <p>The loss function measures the degree of deviation between the predicted and actual values. A smaller loss function indicates a better prediction performance.</p> + <p></p> + </div> + </div> + +</div> </div> </div> <div class="dahe"> @@ -385,32 +398,34 @@ Wet lab experiments apparently favors the success of our model, since extremely </div> </div> <div class="dahe"> -<div class="dabiaotihe" id="s8">Conclusion and Future Perspectives</div> +<div class="dabiaotihe" id="s8">Success and Future prospects</div> <div style="text-align: center;"> <img src="https://static.igem.wiki/teams/4815/wiki/devider.png"> </div> <div class="wenbenkuang"> -<p>By constructing artificial intelligence and base mutation models, we were able to predict expression -intensity based on core promoter sequences and -simulate the promoter evolution process. This study verifies that the pre-training and fine-tuning -paradigm can effectively address the issue of small dataset size. In -the future, we plan to integrate both models to identify the sequence mutations that cause changes -in expression levels. Furthermore, we aim to identify the sequences -responsible for high promoter expression and study the biological mechanisms underlying these high -expression sequences in conjunction with wet lab experiments. This -will allow for a better understanding of the functional regions of promoters.</p> -<p> -Furthermore, larger-scale and longer-term wet lab experiments can be conducted to measure a greater -amount of data, collecting tens of thousands of low-throughput, -high-precision data points. These data can then be fed back into our computational experiments, -allowing us to further improve the goodness of fit of our AI model. -Additionally, multiple generations of cultivation followed by sequencing can be performed, enabling -evolutionary analysis to identify hotspots and mutation preferences -in the promoter region. This information can be used to optimize the base mutation model and can -also provide data for studying factors that influence expression -intensity due to sequence mutations in regions other than the core promoter. These researches can in -turn help improve our understanding of promoter structure and -enable the addition of new parameters to our base mutation model.</p> +<style> + .indented-paragraph { + padding-left: 2em; + text-indent: -1em; + position: relative; + } + + .indented-paragraph::before { + content: "• "; + position: relative; + left: 0; + top: 0; + } + +</style> + + <p class="indented-paragraph">Our AI Pymaker and base mutation models give us <a style="color:red">a brand new understanding</a> of the evolutional pattern of yeasts promoter sequences, and <a style="color:red">a deep insight</a> into the highly complex mechanisms behind the interaction between cis and trans acting elements. </p> + <p class="indented-paragraph">Our AI Pymaker and base mutation models give us the ability to <a style="color:red">predict expression</a> intensity based on core promoter sequences and <a style="color:red">simulate</a> the promoter <a style="color:red">evolution</a> process, which, has never been done so successful before. Our success in experiments proves our models are theoretically and practically powerful. </p> + <p class="indented-paragraph">Our AI Pymaker and base mutation models verifies that the <a style="color:red">‘pre-train + fine-tuning’ paradigm</a> can effectively and practically <a style="color:red">address the issue of small dataset size</a>. </p> + <p class="indented-paragraph">Our AI Pymaker and base mutation models can be integrated to identify the sequence mutations that cause changes in expression levels. In other words, we aim to <a style="color:red">identify</a> the sequences responsible for high promoter expression——<a style="color:red">the functional hot-point in promoter sequences which remain unknown to this day</a>, and study the biological mechanisms underlying these high expression sequences in conjunction with wet lab experiments in the future.</p> + <p class="indented-paragraph">All these thrilling things can be successfully done using our AI Pymaker and base mutation models.</p> + +<p>Furthermore, larger-scale and longer-term wet lab experiments can be conducted to measure a greater amount of data, collecting tens of thousands of low-throughput, high-precision data points. These data can then be fed back into our computational experiments, allowing us to further improve the goodness of fit of our AI model. Additionally, multiple generations of cultivation followed by sequencing can be performed, enabling evolutionary analysis to identify hotspots and mutation preferences in the promoter region. This information can be used to optimize the base mutation model and can also provide data for studying factors that influence expression intensity due to sequence mutations in regions other than the core promoter. These researches can in turn help improve our understanding of promoter structure and enable the addition of new parameters to our base mutation model.</p> </div> </div> <div class="dahe">