@@ -123,17 +123,19 @@ export function Description() {
</p>
<h3> Machine Learning Architecture </h3>
<p>
We are currently testing multiple model design choices. The performance of each design is evaluated using deviations from the original MPE. Our experimental stage consists of 3 different designs of the ML workflow.
We are currently testing multiple model design choices. The performance of each design is evaluated using deviations from the original MPE. Our experimental stage consists of 2 different designs of the ML workflow.
<br/><br/>
Figure 3: Different current ML workflows
<br/><br/>
a. The processed DNA sequences are inputted into the ENFORMER model to predict MPE.
b. The processed DNA sequences are tokenized and input into a Recurring Neural Network (RNN) model to predict MPE.
c. The processed DNA sequences are put through DNABert and Traditional ML to predict MPE.
<br/><br/>
Our team chose the DNABert and Enformer models because both are suitable for training using DNA sequences as input. RNN is also used because it is suitable for sequence-based data and self-supervised learning.
The selection process combines predictions from the HLA compatibility classifier (not included in this study) and the predicted expression level (this study) to ensure that selected proteins are immunologically relevant and biologically practical for DC vaccine development.
This dual filtering strategy ensures that only proteins that are potentially good matches for the patient's HLA type and are expressed at sufficient levels in tumor cells are chosen as candidates.
Our team chose the Enformer model because it is pre-trained with data from long-range interactions -up to 100 kb away- in the genome. Therefore, the Enformer model can better understand the relationship between DNA sequences and the MPE value.
Additionally, We chose RNN because it is suitable for sequence-based data and self-supervised learning. In paper [14], Enformer was evaluated to have the highest accuracy compared to all other models.
Which, therefore, is the most promising model for our experiment.
<br/><br/>
To conclude, our overall selection process combines predictions from the HLA compatibility classifier (not included in this study) and the predicted expression level of neoantigen (this study) to ensure that selected proteins are immunologically relevant and biologically practical for DC vaccine development.
This dual filtering strategy ensures that only proteins that are potentially good matches for the patient's HLA type and are expressed at sufficient levels in tumor cells are chosen as candidates.