MiRNA Selection


Abstract


Tear fluids contain various types of miRNA. Therefore, to quantify only the selected miRNA as a biomarker for disease, the sequence specificity of the amplification system is crucial. In the Dry Lab, we discovered multiple sequences similar to the miRNA chosen as the target in the POIROT, using the database obtained from a comprehensive search for miRNAs predicted to be present in tear fluids. It was found that identifying these as the target using SDA would be difficult, and it was favored to adopt a highly specific TWJ system for the amplification system.

Introduction


Sequence specificity is one of the most significant factors in the miRNA amplification and quantification system. In amplification systems with low specificity, amplification reactions can proceed due to not only the target miRNA but also miRNAs with similar sequence, resulting in output signals that do not accurately reflect the concentration of the target miRNA in the sample. Since numerous miRNAs are present in samples derived from humans, it is essential to avoid amplification reactions triggered by non-target miRNAs in order to accurately quantify the miRNA selected as a biomarker. In fact, it has been confirmed by the Wet Lab that when SDA without a TWJ complex is used, amplifications do occur even with miRNAs that have only a few differences in the bases compared to the target miRNA (refer to Results).

On the other hand, the required level of sequence specificity should be determined considering the miRNAs contained in the sample. If miRNAs with similar sequence to the sequence of target miRNA were not contained in the sample, the required level of sequence specificity would be lower. When using complex reactions involving many more types of molecules, such as TWJ, which is known for its high sequence specificity, there is a possibility of drawbacks such as reduced amplification efficiency. Hence, it cannot be conclusively stated that a system with higher specificity is always better. It is important to ensure reliability by designing a system that can accurately quantify the biomarker even under the presence of miRNAs that can potentially exist in the sample and interfere with the system.

With this in mind, the Dry Lab aimed to contribute to the Wet Lab and the project by using data from previous studies to search for similar sequences among the miRNAs that may be present in the sample.

Methods


There are few studies that have comprehensively explored miRNAs in tear fluids, and there was no appropriate data available for searching similar sequences of the miRNAs selected as biomarkers. Therefore, we considered a Meta-analysis, which integrates and analyzes multiple research reports. However, Human Practices to Professor Ochiya, an expert in disease-related miRNAs, has proven that it is quite challenging to extract raw data from different studies, align the data formats, and reanalyze them. It was also revealed that a Meta analysis on miRNAs in tear fluids, an area that is still not researched enough yet, would be unable to ensure reliability. Although research using a Meta-analysis to identify disease-specific biomarker miRNAs is actively conducted, no established method exists yet, and it would be difficult to develop a highly reliable method within the one-year duration of the project. Target miRNAs identified by unreliable methods would be nothing less than abdicating the responsibility as a scientist. It was concluded that it would be more appropriate to refer to the data from a single, reliable paper that has comprehensively explored miRNAs. We referred to data from the study 1 introduced by Professor Ochiya, which conducted a comprehensive exploration of miRNAs in aqueous humor, and we researched the presence of similar sequences in the sample.

The study 1 reported the detection of 1623 types of miRNAs in aqueous humor. However, the data presented in the study does not include the specific sequences of each miRNA, making it impossible to directly search for sequences similar to the target. Therefore, a database containing 1493 miRNAs was created by referencing miRBase Release 22.1 2, which includes numerous reported miRNA sequences, and extracting the human miRNAs with the same names as those included in the data of the study along with their sequences.

General similarity search software, such as BLAST, aims to evaluate the homology between sequences and uses its own scoring system to assess the statistical significance of sequence occurrence frequency 3. While such software is very useful for searching similar sequences, it does not take the interactions between nucleotides during evaluation into consideration. Therefore, it may not accurately assess the concerns of this project regarding the binding of miRNAs similar to the target and templates, and the potential for unintended amplification reactions. Consequently, a program was developed to search for similar sequences using simpler and more intuitive scoring metrics, such as the percentage of matching nucleotides and the number of mismatches. Although these metrics may not accurately evaluate the interactions between miRNAs with similar sequences and templates, they are easier for users to understand. This makes it easier to set search conditions according to the expected performance of the amplification system and helps narrow down similar sequences that should be considered in the selection of target miRNAs and system design, as well as sequences that should be verified through wet experiments or simulations.

In this program, all miRNAs from the database that have a similarity score above a certain threshold when compared to a given miRNA sequence are output. The algorithm is as follows:

  1. Input the following three parameters: "The sequence of the miRNA for which similar sequences are to be searched", "The minimum length of the subsequence to be compared", "The minimum similarity score threshold for the output sequences".
  2. Extract miRNAs from the database.
  3. Compare the lengths of the input sequence and the extracted database sequences, and select an n such that (length of the shorter sequence) ≧ n ≧ (minimum subsequence length)
  4. For each n, extract consecutive subsequences of n nucleotides from both the input sequence and the database sequences, and calculate the similarity (either the percentage of matching nucleotides or the number of mismatches).
  5. Calculate the similarity for all possible n from step 3 and all possible subsequences from step 4, taking the maximum value as the similarity score between the two sequences.
  6. Execute steps 2 to 5 for all sequences in the database, and display all miRNAs with similarity scores above the given threshold.
hoeghoge

Figure 1. Behavior of similar sequence search program.
The program extracts sequences from the database and evaluates their similarity to the target sequence by checking for base matches.

In this project, considering the concentration ratios of aqueous humor between glaucoma patients and the control group, hsa-miR-10b-5p, hsa-miR-375, and hsa-miR-30d-5p were selected as target miRNAs. Similarity searches for these three miRNAs were conducted using the aforementioned methods. Taking into account the results of specificity evaluation from wet experiments using the 22 nt miRNA hsa-let-7b and the let-7 family, the minimum subsequence length was set to 20, and similar sequences with a sequence match ratio of 80% or higher to the target miRNAs were extracted.

Results


hsa-miR-10b-5p

miRNAs that are similar in sequence to hsa-miR-10b-5p include hsa-miR-10a-5p, hsa-miR-100-5p, hsa-miR-99a-5p, and hsa-miR-99b-5p. miR-10a-5p consists of 23 nt, differing from miR-10b-5p at only the 12th nucleotide. miR-100-5p, miR-99a-5p, and miR-99b-5p consist of 22 nucleotides, with 4, 5, and 6 mismatches, respectively, when aligned with the 3' end of miR-10b-5p.

hsa-miR-375

As a result of the similarity sequence search, no miRNAs with sequences similar to hsa-miR-375 were found.

hsa-miR-30d-5p

miRNAs with sequences similar to hsa-miR-30d-5p were identified as hsa-miR-30a-5p and hsa-miR-30e-5p. All of these miRNAs are 22 nucleotides long, with hsa-miR-30a-5p having a mismatch of 1 nucleotide and hsa-miR-30e-5p having a mismatch of 2 nucleotides.

hoeghoge
Figure 2. Similar miRNA sequences found in similar sequence search of (a) hsa-miR-10b-5p and (b) hsa-miR-30d-5p.
Light green bases in similar sequences represent mismatches with the target sequence.

Conclusion and Future Prospects


As a result of the biomarker similarity sequence search for miRNAs expected to be present in tear fluid, similar sequences were found for miR-10b-5p and miR-30d-5p. Among these, there are miRNAs that differ by only 1 or 2 nucleotides from the biomarkers, making it challenging to distinguish them using simple SDA based on wet results. These similar miRNAs found in the aqueous humor are also thought to be present in tear fluid, potentially leading to false detections during quantitative analyses using tears. Therefore, it is supported that a more sequence-specific TWJ, rather than a simple SDA, should be employed in the amplification system, leading to improvements in the project.

The program created this time can search for similar sequences of miRNAs contained in various samples other than aqueous humor by changing the database being targeted. POIROT can be used for the quantification of any miRNA by modifying the template sequence. Therefore, when applied to samples other than tear fluid, it is expected to contribute to the appropriate selection of biomarkers, ensuring the specificity and quantitativeness of the system by searching for similar sequences of target miRNAs within a database that includes miRNAs present in the sample.

In the current search for similar sequences, a similarity evaluation metric focusing solely on the matches and mismatches of nucleotide sequences was used. However, it is also possible to evaluate the binding energy between miRNAs with similar sequences to the template and target, as well as assess the impact of these similar sequences on the amplification system through simulations using an ODE model. Since these evaluations take longer to compute compared to the similarity evaluation methods used in this analysis, extracting similar sequences from numerous miRNAs using this program enables efficient assessment.

The program and database used for the sequence search in this study have been incorporated into the software and are publicly available on GitLab. For more details, please visit: Software

References


  1. Tanaka, Y., Tsuda, S., Kunikata, H. et al. (2014). Profiles of Extracellular MiRNAs in the Aqueous Humor of Glaucoma Patients Assessed with a Microarray System . Sci Rep 4, 5089. https://doi.org/10.1038/srep05089

  2. Faculty of Biology, Medicine and Health, The University of Manchester. miRBase. https://www.mirbase.org

  3. The National Center for Biotechnology Information. The Statistics of Sequence Similarity Scores. https://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html