<p>Since the underlying mechanisms for sRNA gene regulation vary depending upon the regulated gene, the sRNA sequence, and the help of chaperone proteins, among other factors, one of the main challenges for sRNA design is trying to find patterns to accurately characterize this behavior.
<p>For the baseline design of the sRNA constructs, our team followed the guidelines stated in the project of iGEM Paris-Bettencourt (2013) [1], who stated that, according to Na et al. (2013) [2], the sRNA must have a perfect base pair matching within the first 24 nucleotides of the target mRNA. Downstream the target-binding sequence, an Hfq-binding domain, taken from the MicC sRNA lacking the binding site to the mRNA OmpC-, is located for recruiting the chaperone protein Hfq. </p>
<p>However this criteria neither considers sRNA structural and thermodynamic profiling nor its hybridization efficiency.</p>
<p>(Tafer et al., 2008) [3] analyzed the effects of the structure of a target RNA sequence on interference RNA (RNAi) based on the accessibility of the target site. Subsequently, they developed a tool called RNAxs to aid in the selection of highly efficient siRNAs. The program RNAplfold was used to move a window along the mRNA and evaluate the probability that this stretch is unpaired in thermodynamic equilibrium, thus ensuring the accessibility of the site. Despite the fact that the mechanisms underlying siRNA effects are different from those regarding sRNAs (because siRNAs are exclusive of eukaryotic cells), Vazquez-Anderson et al. (2017) [4] also takes into account target accessibility for asRNA hybridization.</p>
</div>
</section>
<sectionid="MysiRNA_ERNAi"class="page-section">
<divclass="row">
<strong>MysiRNA Design center</strong>
<p>On the other hand, Mysara et al (2015) [5] developed a tool for siRNA design which considers conserved sequence targeting, SNPs and off-target avoiding, and target accessibility, among others.</p>
(Tafer et al., 2008) analyzed the effects of the structure of a target RNA sequence on interference RNA (RNAi) based on the accessibility of the target site. Subsequently, they developed a tool called RNAxs to aid in the selection of highly efficient siRNAs. The program RNAplfold was used to move a window along the mRNA and evaluate the probability that this stretch is unpaired in thermodynamic equilibrium, thus ensuring the accessibility of the site. Despite the fact that the mechanisms underlying siRNA effects are different from those regarding sRNAs (because siRNAs are exclusive of eukaryotic cells), Vazquez-Anderson et al. (2017) also takes into account target accessibility for asRNA hybridization.
</p>
<h4>MysiRNA Design center</h4>
<p>On the other hand, Mysara et al (2015) developed a tool for siRNA design which considers conserved sequence targeting, SNPs and off-target avoiding, and target accessibility, among others.</p>
<h4>ERNAi</h4>
<p>The tool filters by siRNA specificity and efficiency considering sequence and structural properties of the given siRNAs.</p>
</div>
<strong>ERNAi</strong>
<p>The tool filters by siRNA specificity and efficiency considering sequence and structural properties of the given siRNAs [6]</p>
</div>
</section>
<sectionid="Titulo3"class="page-section">
<sectionid="AddConsid"class="page-section">
<divclass="row text-center">
<h2>Additional considerations upon sRNA gene regulation and HFQ interaction</h2>
<p>According to Na et al. (2013) [1], the presence of a consensus scaffold sequence on bacterial sRNAs for recruiting the Hfq protein facilitates the hybridization between the sRNA and the target mRNA as well as mRNA degradation. Focusing on the vicinity of the TIR (Translation Initiation Region), there is a direct correlation between the binding free energy and the gene silencing efficiency, where lower free energies correspond to higher gene repression.</p>
<p>In E. coli most sRNAs that bond to mRNAs depend on the chaperone protein Hfq; most sRNAs characterized block translation by direct-binding to the ribosome in the 5’-UTR of target mRNAs to prevent 30S ribosome binding and translation initiation (Fig. 2C). However, when the sRNA binds to an inhibitory translation sequence in the 5’-UTR, the RBS can become available, allowing translation initiation (Fig. 2D) [7].</p>
<p>In other cases RNA-binding Hfq has shown to be involved in the recruitment of RNase e and the sRNA-mRNA decay; however this mechanism remains unclear. According to Lalalouna et al. (2013) [7], one pathway suggests that mRNAs can become more sensitive to RNase E attacks after base-pairing with sRNAs, as a result of the loss of protection conferred by translating ribosomes (Fig. 2B). Finally, other pathway states that recruitment of RNase E on the target mRNA triggers formation of a sRNA/Hfq/RNase E complex that favors RNase E degradation (Fig. 2A).</p>
<p>It is important to note that when the target sites are located deep in CDS mRNA regions, the sRNA/Hfq/mRNA/RNase E complex is more likely to execute a “degradation-only” mechanism (Wagner et al., 2015) (Figure 3) [8].</p>
<pstyle="text-align:center;"><h6style="text-align: center;">Figure 3. Degradation-only mechanism mediated by sRNA/mRNA/Hfq/RNase E complex downstream gene start</h6></p>
</div>
</section>
<sectionid="MysiRNA_ERNAi"class="page-section">
<divclass="row">
<divclass="col-12">
<h2>Additional considerations upon sRNA gene regulation and HFQ interaction</h2>
<p>
According to Na et al. (2013), the presence of a consensus scaffold sequence on bacterial sRNAs for recruiting the Hfq protein facilitates the hybridization between the sRNA and the target mRNA as well as mRNA degradation. Focusing on the vicinity of the TIR (Translation Initiation Region), there is a direct correlation between the binding free energy and the gene silencing efficiency, where lower free energies correspond to higher gene repression. In E. coli most sRNAs that bond to mRNAs depend on the chaperone protein Hfq; most sRNAs characterized block translation by direct-binding to the ribosome in the 5’-UTR of target mRNAs to prevent 30S ribosome binding and translation initiation (Fig. 1C). However, when the sRNA binds to an inhibitory translation sequence in the 5’-UTR, the RBS can become available, allowing translation initiation (Fig. 1D).
</p>
<p>
In other cases RNA-binding Hfq has shown to be involved in the recruitment of RNase e and the sRNA-mRNA decay; however this mechanism remains unclear. According to Lalalouna et al. (2013), one pathway suggests that mRNAs can become more sensitive to RNase E attacks after base-pairing with sRNAs, as a result of the loss of protection conferred by translating ribosomes (Fig. 1B). Finally, other pathway states that recruitment of RNase E on the target mRNA triggers formation of a sRNA/Hfq/RNase E complex that favors RNase E degradation (Fig. 1A). It is important to note that when the target sites are located deep in CDS mRNA regions, the sRNA/Hfq/mRNA/RNase E complex is more likely to execute a “degradation-only” mechanism (Wagner et al., 2015).
<pstyle="text-align:center;"><h6style="text-align: center;">Figure 2. Main mechanisms for sRNA/mRNA/Hfq interaction</h6></p>
<strong>TIR region sRNA targeting</strong>
<p>(Yoo et al., 2013) [9] propose a simple design, considering as the only criterion the union of the sRNA within the TIR region, this region is where the ribosome joins, with extension from the SD sequence up to the next 30 nt. It is worth mentioning that this proposal contemplates the binding energy of sRNA with mRNA, to achieve the purpose of altering the efficiency of translation, it also considers the size of 20 to 30 nts in length, since the greater the length, the greater the possibility of off-target repression, the binding energy of -30 to -40 kcal/mol is also considered.</p>
<pstyle="text-align:center;"><h6style="text-align: center;">Figure 3. Proposed degradation pathway for sRNA/mRNA interactions downstream the start codon
</h6></p>
<h4>TIR region sRNA targeting</h4>
<p>
(Yoo et al., 2013) propose a simple design, considering as the only criterion the union of the sRNA within the TIR region, this region is where the ribosome joins, with extension from the SD sequence up to the next 30 nt. It is worth mentioning that this proposal contemplates the binding energy of sRNA with mRNA, to achieve the purpose of altering the efficiency of translation, it also considers the size of 20 to 30 nts in length, since the greater the length, the greater the possibility of off-target repression, the binding energy of -30 to -40 kcal/mol is also considered.
</p>
<h4>sRNA design considering Hfq recruting and mRNA TIR targetting</h4>
<p>(Zhu et al., 2021) designed a synthetic sRNA system based on the MicC scaffold and the chaperone Hfq to control gene expression in Methylorubrum extorquens. The criteria that they used for designing the asRNA were length, location and binding free energy. Their paper <strong>also</strong> cites (Na et al. 2013), which says that an asRNA 24 nucleotides long in the translation initiation region (TIR) of the target mRNA shows high suppression activity (>90%). Their sRNA was designed accordingly. The online service DINAMelt was used to calculate the binding free energy between the asRNA and its target mRNA.</p>
</div>
<strong>sRNA design considering Hfq recruiting and mRNA TIR targetting</strong>
<p>(Zhu et al., 2021) [10] designed a synthetic sRNA system based on the MicC scaffold and the chaperone Hfq to control gene expression in Methylorubrum extorquens. The criteria that they used for designing the asRNA were length, location and binding free energy. Their paper also cites (Na et al. 2013) [2], which says that an asRNA 24 nucleotides long in the translation initiation region (TIR) of the target mRNA shows high suppression activity (>90%). Their sRNA was designed accordingly. The online service DINAMelt was used to calculate the binding free energy between the asRNA and its target mRNA.</p>
</div>
</section>
<sectionid="Titulo4"class="page-section">
<sectionid="OurSolution"class="page-section">
<divclass="row text-center">
<h2>Our solution: rnatrix</h2>
<h3class="text-secondary">A neural network-based python program for designing optimal sRNA sequences and predicting its downregulation or upregulation behavior</h3>
</div>
</section>
<sectionid="Rnatrix"class="page-section">
<divclass="row">
<divclass="col-12">
<h2>Our solution: sRNA Designer</h2>
<p>
As we have pointed out before, actually there are no software tools for sRNA designing in order to target a specific gene. The actual protocols for sRNA design mainly takes into account the targeting of the TIR region as the main criteria for a given sRNA, and then the thermodynamic properties of the sRNA:mRNA base pairing are calculated using external tools (without compromising the previous sRNA selection). On the other hand, the actual software tools for asRNA design are focused on siRNAs and miRNAs (RNAxs, MysiRNA, ERNAi). In order to overcome these obstacles, our team created sRNA Designer: a python-based pipeline for creating a dataset of sRNAs for a given mRNA, and then selecting the best sRNA options considering:
</p>
<ol>
<li>The optimal structural and sequence features for a given sRNA, thus considering having "true sRNA" characteristics (see model 1)</li>
<li> 2. Optimal sRNA:mRNA hybridization efficiency, considering the accessibility of the mRNA target region and the self-folding energies of the sRNA and the target mRNA (see model 2)</li>
</ol>
<p>
Considering that when a sRNA with a Hfq binding domain targets sites downstream the CDS, the exerted mechanism is a “degradation-only” one; our team initially inferred that higher gene repression rates can be achieved when the designed sRNAs have better structural and sequence features (see point 1) and a higher hybridization efficiency with the mRNA (see point 2) than a conventional designed sRNA that only will attach to the first 24 nts of the mRNA CDS.
<p>As we have ponted out before, actually there are not software tools for sRNA designing in order to target a specific gene. The actual protocols for sRNA design mainly takes into account the targeting of the TIR region as the main criteria for a given sRNA, and then the thermoidymanical properties of the sRNA:mRNA basepairing are calculated using external tools (without compromising the previous sRNA selection)</p>
<p>On the other hand, the actual software tools for asRNA design are focused on siRNAs</p>
</div>
<p>In order to overcome these obstacles, our team created rnatrix: a python-based pipeline for creating a dataset of sRNAs for a given mRNA, and then selecting the best sRNA options cosnidering:</p>
<p>1. The optimal structural and sequence features for a given sRNA, thus considering having "true sRNA" characteristics (see model 1)</p>
<p>2. Optimal sRNA:mRNA hybridization efficiency, considering the accesibility of the mRNA target region and the self-folding energies of the sRNA and the target mRNA</p>
<p>3. The higher probabilities of performing an upregulation or downregulation role on the cell. This feature is calculated using a neural-network based model, previously trained by a database collected by our team</p>
</div>
</section>
<sectionid="Titulo5"class="page-section">
<sectionid="Scoring"class="page-section">
<divclass="row text-center">
<h2>STEP 1. SCORING</h2>
<h3class="text-secondary">sRNA Thermodynamic scoring based on models 1 and 2</h3>
</div>
</section>
<sectionid="Transcript"class="page-section">
<divclass="row">
<divclass="col-12">
<h2>Design pipeline</h2>
<p></p>
<h4>Blast alignment</h4>
<p>
First of all, it is necessary to perform an alignment for establishing conserved sequences upon the target gene sequence. In this case the software sends a request to NCBI BLAST for searching through the ‘nt’ database with the DNA sequence of the target as parameter. Once the alignment is finished, a .XML archive is loaded onto the pipeline with the BLAST retrieved data in order to parse it and get the conserved sequences. From the XML only the alignments with an E-value lower than 10-10 will be taken into account. The XML file indicates the location over the target sequence of each alignment with the E-value. In order to get highly conserved sequence ranges, the frequency of each nucleotide position identified in all the alignments is calculated, and then only those positions with a frequency higher than 90% are taken into account for determining the conserved sequences. For get the position and length of the conserved regions, all the uninterrupted series of 24 or more consequent nucleotides with a frequency higher than 90% are finally considered as conserved sequences.
</p>
<p>It is important to note that the BLAST alignment can be performed for further pipeline improvement, however, since the used genes for proof of concept over the present project are present on vectors, there’s no need for identifying consensus sequences.
</p>
<h4>Transcription to mRNA and sRNA dataset creation</h4>
<divclass="col">
<strong>Transcription to mRNA and sRNA dataset creation</strong>
<p>Since all of the sRNAs have to bind to a 24-nt mRNA region (changeable by the user), a sliding-window of 24-nt is placed over the identified conserved sequences of the DNA and then the transcripted mRNA together with its corresponding binding-sRNA are created. After this, a Hfq-binding domain is attached to the 3’ end of all the sRNAs for further analysis.</p>
<h4>sRNA:mRNA scoring</h4>
<p>
Then, the scoring method taken from models I and II is applied to all the created sRNA:mRNA pairs.
<h4>Sequence and structural information-based sRNA scoring</h4>
<p>The score taken from Model I is used for address the degree of being a “true sRNA” for all created sequences, since its original purpose was to filter out sRNA sequences from non-sRNA ones in the PresRAT server for identification of bacterial sRNA sequences developed by Kumar and collaborators (Kumar et al. 2021)
</p>
<h4>Sequence score and Uracil load scoring</h4>
<p>
These parameters are calculated straightforward using the formulas previously explained on the model section considering only the sRNA sequence.
</p>
<h4>Local minima profiling (energy landscape) and RNA suboptimal structures</h4>
<p>
As stated before, both ABE and ALE scores require the information of RNA local minima. For calculating the number of local minima of a given RNA sequence together with its respective free energy values, in the present work the packages RNAsubopt and Barriers (Lorenz et al., 2011) were used. Some RNA molecules form meta-stable structures represented as local minima on an energy landscape. The barriers algorithm identifies all the local minima and energy barriers separating them inside an energy landscape of a given RNA sequence (Gruber et al., 2008). A concise explanation of the use of the two previously mentioned programs is stated by (Chen & Burke, 2015): while RNAsubopt computes all the possible conformations that a given RNA can adopt within a defined energetic range [kcal/mol] above the Minimum free energy of the sequence, the program barriers takes all the suboptimal structures given by RNAsubopt and then find all the local minima and the according saddle points between them. All the structures given by RNAsubopt can be one of the follows:
</p>
<ul>
<li>A local minimum;</li>
<li>A saddle point connecting at least two local minimum points, or;</li>
<p>Then, the scoring method taken from models I and II is applied to all the created sRNA:mRNA pairs.</p>
<strong>Sequence and structural information-based sRNA scoring (score 1)</strong>
<p>The score taken from Model I is used for address the degree of being a “true sRNA” for all created sequences, since its original purpose was to filter out sRNA sequences from non-sRNA ones in the PresRAT server for identification of bacterial sRNA sequences developed by Kumar and collaborators (Kumar et al. 2021) [11]</p>
<strong>Sequence score and Uracil load scoring (score 1)</strong>
<p>These parameters are calculated straightforward using the formulas previously explained on the model section considering only the sRNA sequence.</p>
<p></p>
</div>
</div>
</section>
<sectionid="M1ABE_N"class="page-section">
<divclass="row">
<divclass="col">
<strong>Local minima profiling (energy landscape) and RNA suboptimal structures (score 1)</strong>
<p>As stated before, both ABE and ALE scores require the information of RNA local minima. For calculating the number of local minima of a given RNA sequence together with its respective free energy values, in the present work the packages RNAsubopt and Barriers (Lorenz et al., 2011) [12] were used. Some RNA molecules form meta-stable structures represented as local minima on an energy landscape. The barriers algorithm identifies all the local minima and energy barriers separating them inside an energy landscape of a given RNA sequence (Gruber et al., 2008) [13]. A concise explanation of the use of the two previously mentioned programs is stated by (Chen & Burke, 2015) [14]: while RNAsubopt computes all the possible conformations that a given RNA can adopt within a defined energetic range [kcal/mol] above the Minimum free energy of the sequence, the program barriers takes all the suboptimal structures given by RNAsubopt and then find all the local minima and the according saddle points between them. All the structures given by RNAsubopt can be one of the follows:</p>
<ol>
<li>A local minimum</li>
<li>A saddle point connecting at least two local minimum points, or</li>
<li>The basin of one local minimum</li>
</ul>
<p>
For calculating the structure of the local minima, first RNAsubopt generates a list of the suboptimal structures in dot-bracket notation with the folding energy of each structure; in this step only the sequence and the energy range above the MFE must be specified, then barriers calculates the structures of all local minima (also in dot-bracket notation) with the folding energy of each local minimum taking as an input the information of suboptimal structures given by RNAsubopt. It is important to note that one must specify how much local minima can be computed by the program; because in the present project, the terms ABEscore and ALEscore only work with the average features of the listed local minima, we noticed that there’s not enough effect if the number of maximum computed local minima changes with respect to the default parameter indicated by (Gruber et al., 2008) (max. 50). On the other hand, because the number of suboptimal structures calculated by RNAsubopt grows exponentially with the sequence length and the energy range above the MFE, and thus the computing time also increases, our team conducted a test series in order to identify in which energetic range the calculations of RNAsubopt are still reliable for using its data on the calculation of ABEscore and ALEscore while maintaining the computing time at a minimum; as result we defined 5 kcal/mol as an adequate energy range. The algorithm for calculating local minima structures and folding energies is noted on figure (3)
<pstyle = "font-size:14px ; text-align: center;"><b>Figure 5. </b>Algorithm for calculating the structure of local minima and its respective self-folding energies</p>
<p>
Once all of the 4 individual scores (ABE, ALE, Sequence and U-rich) are calculated, a normalizing function is applied to the entire dataset.
</p>
<h4>Hybridization efficiency scoring</h4>
<p>In this part, the free energy of base pairing between the sRNA and the mRNA target regions ΔG asT, the free energy of local folding of target mRNA region (plus one nucleotide in 3’ and 5’ direction when available) ΔG tF are calculated using RNAfold from Viennarna suite. Finally, an accessibility factor θ is calculated using the package Nupack. All of these parameters are used for calculating score 2, which estimates the hybridization efficiency of the generated sRNAs; as with score 1, this data is also normalized in order to add both scores for calculating a final one in 0-2 scale.</p>
<h4>Final scoring and data delivery to user</h4>
<p>The final score is the result of adding both scores giving them an equal ponderation. Since both of them measure different parameters, we consider that this operation is non redundant. Once the pipeline ends, a .csv archive is given to the user with the dataset of all created sRNA:mRNA pairs, with every parameter calculated and at the end of the columns are the normalized scores 1 and 2, with the final one.</p>
<h4>Final sRNA:mRNA best pairs selection</h4>
<p>Despite the fact that the created sRNAs are for all the analyzed genes, our team decided to only consider those inside a region of 100 nts downstream the start of the gene. This is because there is not enough information for predicting the sRNA behavior on sites far downstream of the start codon. Also, after analyzing the databases for the neural network training, we have found that the majority of the sRNAs who pair on the CDS target sites near the gene start.</p>
</ol>
</div>
</div>
</section>
<sectionid="Titulo6"class="page-section">
<divclass="row text-center">
<h2>That's when we got into a question. </h2>
<h3class="text-secondary">¿Is there a way to predict sRNA function?</h3>
</div>
<h4>Research</h4>
<p>
After several futile attempts to find either a mathematical or computational model in literature able to accurately predict sRNA functioning (that is, whether its interaction with the target mRNA will result in an up- or down-regulation), we concluded a new approach for its prediction was due.
</p>
<h4>Design</h4>
<p>
During our research we realized that there might be enough sRNA-mRNA interactions reported in literature to build a functioning machine learning model. Nonetheless, these were spread across multiple databases and publications. For our first iteration, we used []’s dataset to construct a baseline model consisting of a neural network with only a few layers and neurons. The hybridization window was obtained through the RNAup library and then this was used to calculate features derived from our sRNA designer model: ALE score, ABE score, U score, sequence score, mRNA-sRNA base pairing minimum free energy, and self-folding mRNA target region minimum free energy.
Unfortunately, this initial model was not able to capture the behavior of the data, suffering from both high variance and bias. Thus, in order to further improve our predictions, we had to gather and homogenize observations from other sources with the help of the NCBI database. This homogenization was needed because many of these interactions did not readily contain all the data needed to calculate the desired features. This being done, we then had to look for new features as well as apply careful preprocessing.
</p>
<h4>Model</h4>
<p>
For our model’s design, the sequences were first preprocessed since many of them were not candidates and had to be dropped. The positions of hybridization were found and used to calculate training parameters. The following features, as well as their squared value, were calculated with the help of NuPack and ViennaRNA:
<pstyle="text-align:center;"><h6style="text-align: center;"><strong>Table 1.</strong> Considered features</h6></p>
<p>Thus, our final training dataset consisted of 557 observations with 40 features. The data preprocessing was in big part done with the aid of scikit learn functionalities (such as StandardScaler, LabelEncoder and train_test_split), which was then fed to a TensorFlow neural network for training.</p>
<p>For calculating the structure of the local minima, first RNAsubopt generates a list of the suboptimal structures in dot-bracket notation with the folding energy of each structure; in this step only the sequence and the energy range above the MFE must be specified, then barriers calculates the structures of all local minima (also in dot-bracket notation) with the folding energy of each local minimum taking as an input the information of suboptimal structures given by RNAsubopt. It is important to note that one must specify how much local minima can be computed by the program; because in the present project, the terms ABEscore and ALEscore only work with the average features of the listed local minima, we noticed that there’s not enough effect if the number of maximum computed local minima changes with respect to the default parameter indicated by (Gruber et al., 2008) (max. 50) [13].</p>
</div>
<p>On the other hand, because the number of suboptimal structures calculated by RNAsubopt grows exponentially with the sequence length and the energy range above the MFE, and thus the computing time also increases, our team conducted a test series in order to identify in which energetic range the calculations of RNAsubopt are still reliable for using its data on the calculation of ABEscore and ALEscore while maintaining the computing time at a minimum; as result we defined 5 kcal/mol as an adequate energy range. The algorithm for calculating local minima structures and folding energies is noted on figure (3)</p>
<pstyle="text-align:center;"><h6style="text-align: center;"><strong>Figure 8.</strong> Model Summary taken from Tensorflow</h6></p>
<sectionid="M1ABE_N"class="page-section">
<divclass="row">
<divclass="col">
<pstyle="text-align: center;"><strong>Once all of the 4 individual scores (ABE, ALE, Sequence and U-rich) are calculated, a normalizing function is applied to the entire dataset.</strong></p>
<p>In this part, the free energy of base pairing between the sRNA and the mRNA target regions (ΔG asT), the free energy of local folding of target mRNA region (plus one nucleotide in 3’ and 5’ direction when available) (ΔG tF) are calculated using RNAfold from Viennarna suite. Finally, an accessibility factor θ is calculated using the package Nupack. All of these parameters are used for calculating score 2, which estimates the hybridization efficiency of the generated sRNAs; as with score 1, this data is also normalized in order to add both scores for calculating a final one in 0-2 scale. This scoring is taken from model II (Vazquez-Anderson et al., 2017) [4]</p>
<p><strong>Final scoring and data delivery to user</strong></p>
<p>The final score is the result of adding both scores giving them an equal ponderation. Since both of them measure different parameters, we consider that this operation is non redundant. Once the pipeline ends, a .csv archive is given to the user with the dataset of all created sRNA:mRNA pairs, with every parameter calculated and at the end of the columns are the normalized scores 1 and 2, with the final one</p>
<p><strong>Final sRNA:mRNA best pairs selection</strong></p>
<p>Despite the fact that the created sRNAs are for all the analyzed genes, our team decided to only consider those inside a region of 100 nts downstream the start of the gene. This is because there is not enough information for predicting the sRNA behavior on sites far downstream of the start codon. Also, after analyzing the databases for the neural network training, we have found that the majority of the sRNAs who pair on the CDS target sites near the gene start.</p>
<p>It is important to note that the registered parts were selected according to this criteria, since the incorporation of the next steps (neural-network based prediction) was done after we ordered the DNA parts for synthesis</p>
<pstyle="text-align:center;"><h6style="text-align: center;"><strong>Figure 11.</strong> Model metrics from Tensorflow</h6></p>
<sectionid="Step2"class="page-section">
<divclass="row text-center">
<h2>STEP 2. REGULATION BEHAVIOR PREDICTING</h2>
<h3class="text-secondary">Neural-network based prediction of the upregulation or downregulation role of the scored sRNAs</h3>
</div>
</section>
<sectionid= "References"class="page-section">
<h2>References</h2>
<ol>
<li>
<p>Fu, H., Elena, R. C., & Marquez, P. H. (2019). The roles of small RNAs: Insights from bacterial quorum sensing. ExRNA, 1(1), 32<ahref="https://doi.org/10.1186/s41544-019-0027-8"target="blank">https://doi.org/10.1186/s41544-019-0027-8</a></p>
</li>
<li>
<p>Hofacker, I. L. (2008). The Vienna RNA Websuite. Nucleic Acids Research, 36(Web Server), W70–W74. <ahref="https://doi.org/10.1093/nar/gkn188"target="blank">https://doi.org/10.1093/nar/gkn188</a></p>
</li>
<li>
<p>Kucharík, M., Hofacker, I. L., Stadler, P. F., & Qin, J. (2014). Basin Hopping Graph: A computational framework to characterize RNA folding landscapes. Bioinformatics, 30(14), 2009–2017. <ahref="https://doi.org/10.1093/bioinformatics/btu156"target="blank">https://doi.org/10.1093/bioinformatics/btu156</a></p>
</li>
<li>
<p>Kumar, K., Chakraborty, A., & Chakrabarti, S. (2021). PresRAT: A server for identification of bacterial small-RNA sequences and their targets with probable binding region. RNA Biology, 18(8), 1152–1159.<ahref="https://doi.org/10.1080/15476286.2020.1836455"target="blank">https://doi.org/10.1080/15476286.2020.1836455</a></p>
</li>
<li>
<p>Lorenz, R., Bernhart, S. H., Höner zu Siederdissen, C., Tafer, H., Flamm, C., Stadler, P. F., & Hofacker, I. L. (2011). ViennaRNA Package 2.0. Algorithms for Molecular Biology, 6(1), 26. <ahref="https://doi.org/10.1186/1748-7188-6-26"target="blank">https://doi.org/10.1186/1748-7188-6-26 </a></p>
</li>
<li>
<p>Tafer, H., Ameres, S. L., Obernosterer, G., Gebeshuber, C. A., Schroeder, R., Martinez, J., & Hofacker, I. L. (2008). The impact of target site accessibility on the design of effective siRNAs. Nature Biotechnology, 26(5), 578–583. <ahref="https://doi.org/10.1038/nbt1404"target="blank">https://doi.org/10.1038/nbt1404</a></p>
</li>
<li>
<p>Trotta, E. (2014). On the Normalization of the Minimum Free Energy of RNAs by Sequence Length. PLoS ONE, 9(11), e113380.<ahref=" https://doi.org/10.1371/journal.pone.0113380"target="blank"> https://doi.org/10.1371/journal.pone.0113380</a></p>
</li>
<li>
<p>Vazquez-Anderson, J., Mihailovic, M. K., Baldridge, K. C., Reyes, K. G., Haning, K., Cho, S. H., Amador, P., Powell, W. B., & Contreras, L. M. (2017). Optimization of a novel biophysical model using large scale in vivo antisense hybridization data displays improved prediction capabilities of structurally accessible RNA regions. Nucleic Acids Research, 45(9), 5523–5538. <ahref="https://doi.org/10.1093/nar/gkx115"target="blank">https://doi.org/10.1093/nar/gkx115</a></p>
</li>
<li>
<p>Woodson, S. A. (2010). Compact Intermediates in RNA Folding. Annual Review of Biophysics, 39(1), 61–77.<ahref="https://doi.org/10.1146/annurev.biophys.093008.131334"target="blank">https://doi.org/10.1146/annurev.biophys.093008.131334</a></p>
</li>
<li>
<p>Yoo, S. M., Na, D., & Lee, S. Y. (2013). Design and use of synthetic regulatory small RNAs to control gene expression in Escherichia coli. Nature Protocols, 8(9), 1694–1707. <ahref="https://doi.org/10.1038/nprot.2013.105"target="blank">https://doi.org/10.1038/nprot.2013.105</a></p>
</li>
<li>
<p>Zhu, L. P., Song, S. Z., & Yang, S. (2021). Gene repression using synthetic small regulatory RNA in Methylorubrum extorquens. Journal of Applied Microbiology, 131(6), 2861–2875. <ahref="https://doi.org/10.1111/jam.15159"target="blank">https://doi.org/10.1111/jam.15159"</a></p>
</li>
<sectionid="Neural-Network"class="page-section">
<divclass="row">
<divclass="col">
<p>Once the entire dataset with all possible sRNA:mRNA pairs has been created and the scoring criteria has been applied to all pairs (and re-normalized, this time in a 0-1 scale). Our software loads the previously trained neural-network model (see model) in order to calculate the probability of each sRNA to exert an upregulation or downregulation role.</p>
<p>It is important to note that only one probability is calculated depending upon the sRNA: the probability of upregulate or downregulate the desired gene. For the final sRNA selection, the information on table 1 is processed towards a lambda function (see model).</p>
<h6style="text-align: center;">Table 1. Final features considered for best sRNA selection. The predicted role is 0 for downregulation and 1 for upregulation. The sRNA string is not loaded into the neural network model as text, and is shown in this table only for illustration purposes</h6>
<strong>Important notes: </strong><p>Despite the fact that the neural network uses all of the calculated parameters used while scoring the sRNA:mRNA pairs, none of the final scores are used but its individual components without any normalization procedure. Also, some of the parameters used for predicting the regulation behavior are not calculated during the scoring process (see model)</p>
<p>Finally, the user will receive the best sRNA:mRNA target pairs (according to lambda function), with its hybridization position over the entire mRNA transcript in a .csv file called “results.csv”</p>