Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
B
Bangkok-NMH
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Model registry
Analyze
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
2024 Competition
Bangkok-NMH
Commits
e4e6ed5a
Commit
e4e6ed5a
authored
5 months ago
by
wuttigaisorn
Browse files
Options
Downloads
Patches
Plain Diff
fix(fixed-content): fixed-content
parent
d7646003
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Pipeline
#483645
passed
5 months ago
Stage: build
Stage: deploy
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
src/contents/engineering.tsx
+0
-122
0 additions, 122 deletions
src/contents/engineering.tsx
with
0 additions
and
122 deletions
src/contents/engineering.tsx
+
0
−
122
View file @
e4e6ed5a
...
...
@@ -298,128 +298,6 @@ export function Engineering() {
these two types of transcription initiation have distinct regulatory
mechanisms that likely benefit from individualized modeling approaches.
</
p
>
<
h3
>
Cycle 3 (Added 24th September)
</
h3
>
<
p
>
During our research about the effect of promoter sequence on gene
expression and transcription mechanisms (the process most influenced by
DNA sequence), we found that there are 2 different modes of
transcription initiation: focused and dispersed. In focused
transcription initiation, almost all of the transcripts are transcribed
starting from only one TSS. For dispersed transcription initiation, the
transcripts are transcribed starting from multiple TSS spanning about
100bp. Both of these modes have very different regulatory sequences and
different initiation mechanisms. Thus, we hypothesized that training our
model separately based on focused and dispersed modes would improve the
results due to the model’s ability to consider another additional
feature (transcription mechanism).
</
p
>
<
h3
>
Build
</
h3
>
<
p
>
In order to implement the additional features, the original patient
dataset was separated into focused and dispersed groups. One gene ID
consists of numerous transcription start sites (TSS). We separated the
data by measuring the distance of the TSS at Q1 (25th percentile) from
the TSS at Q3 (75th percentile); if the distance is more than 5 base
pairs, it is classified as dispersed data. Otherwise, it is classified
as focused data. After these data are separated, two models are then
trained with either focused or dispersed datasets for experiments 5 and
6.
</
p
>
<
p
>
To test the hypothesis, we constructed two more experiments (four
different models): Experiments 5 and 6. In each experiment, there are
two variations of models: a dispersed and focused TSS model. The models
in experiment 5 will mimic the structure of experiments 1 and 3 by
having a long input sequence. The models in experiment 6 will mimic the
structure of experiments 2 and 4 by having a shortened input sequence.
</
p
>
<
h3
>
Test
</
h3
>
<
p
>
Evaluating the model with the same metrics of R-squared and RMSE, we
compared our experiments 5 and 6, both dispersed and focused versions,
with 3 and 4 to see how the model improves as we classify the model data
into focus and dispersed transcription. These four models are comparable
since they all use our custom patient data.
</
p
>
<
p
>
For experiment 5 (long input sequence), the model with the dispersed
sequence acquired an R-squared of 0.358 and RMSE of 0.42, showing +0.029
R-squared and -0.017 RMSE improvement compared to experiment 3 (long
input sequence). The model with the focused sequence acquired an
R-squared of 0.479 and an RMSE of 0.539, showing +0.15 R-squared
improvement but more error.
</
p
>
<
p
>
For experiment 6, the model with dispersed acquired an R-squared of
0.328 and an RMSE of 0.419, showing a 0.001 difference in R-squared but
an RMSE improvement of -0.024 when compared with experiment 4 (short
input sequence). The model with a focused sequence acquired an R-squared
of 0.479 and an RMSE of 0.539, showing a +0.185 improvement in R-squared
but a 0.96 increase in RMSE as well.
</
p
>
<
img
className
=
"image-center"
width
=
{
"
80%
"
}
src
=
"https://static.igem.wiki/teams/5251/gra-1.jpg"
/>
<
p
className
=
"image-caption"
>
Figure 5: Graphs showing the relationship of actual value (x-axis) and
predicted value (y-axis). Experiment 5 (dispersed): top-left, 6
(dispersed): top-right, 5 (focused): bottom-left, 6 (focused) :
bottom-right.
</
p
>
<
p
>
Table 2: R-squared and RMSE of experiments 3,4,5 and 6.
</
p
>
<
table
>
<
tr
>
<
td
>
Experiment / Results
</
td
>
<
td
>
R-squared
</
td
>
<
td
>
Root mean squared error (RMSE)
</
td
>
</
tr
>
<
tr
>
<
td
>
3 (long input)
</
td
>
<
td
>
R-squared
</
td
>
<
td
>
0.329
</
td
>
</
tr
>
<
tr
>
<
td
>
4 (short input)
</
td
>
<
td
>
0.294
</
td
>
<
td
>
0.443
</
td
>
</
tr
>
<
tr
>
<
td
>
5 (dispersed and long input)
</
td
>
<
td
>
0.358
</
td
>
<
td
>
0.423
</
td
>
</
tr
>
<
tr
>
<
td
>
6 (dispersed and short input)
</
td
>
<
td
>
0.328
</
td
>
<
td
>
0.419
</
td
>
</
tr
>
<
tr
>
<
td
>
5 (focused and long input)
</
td
>
<
td
>
0.479
</
td
>
<
td
>
0.529
</
td
>
</
tr
>
<
tr
>
<
td
>
6 (focused and short input)
</
td
>
<
td
>
0.479
</
td
>
<
td
>
0.539
</
td
>
</
tr
>
</
table
>
<
h3
>
Learn
</
h3
>
<
p
>
To conclude, the focused sequence and the dispersed sequence model both
improved the model (R-squared) significantly. However, the RMSE
increased for the focused sequence, but it decreased for the dispersed
sequence. The dispersed model’s R-squared decreased when we shortened
the input, while the focused model barely changed.
</
p
>
<
p
>
The results showed that our hypothesis of the effect of dispersed and
focused transcription was correct and effective in predicting the
expression of neoantigen proteins. These results can be useful for
future teams that want to work on predicting neoantigen expression.
</
p
>
<
p
>
[1] J. Ferlay et al., “Cancer incidence and mortality patterns in
Europe: Estimates for 40 countries in 2012,” Eur. J. Cancer, vol. 49,
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment