Newer
Older
{% block lead %}Demonstrate engineering success in a part of your project by going through at least one iteration of the
engineering design cycle.{% endblock %}
<br>
<div class="model-heading">
<h1 id="engineering-heading">
Engineering Success
</h1>
</div>
<div class="Content" style="max-height: 100%;">
<p><strong>WET LAB</strong></p>
<p>
In the lab, designing each minor experiment itself is an experience that takes one through the DBTL cycle. As
outlined in our Lab Notebook, even staple molecular biology techniques like PCR and electroporation can prove to
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
be challenging when approaching it for the first time with new organisms and parts to test.
</p>
<p>
The first wall we hit in our experimental journey was the amplification of our ordered sequences. After
thoroughly checking the primers and matching them with the plasmid we were using, we were dejected to get our
amplified products. Our first thought was to reduce the annealing temperature since the temperature calculated
from the sequence information could have a slight deviation. After multiple iterations of this, we were able to
get satisfactory results only by implementing touchdown PCR with a range of annealing temperatures. Finally, we
were able to obtain all the amplified products by reducing our reaction volume which would have aided in better
conduction of heat during the process.
</p>
<p>
Our next hurdle was the electroporation. Electroporation is shown to be the standard approach in transforming
our recombinant DNA into <i>L. lactis</i>. However, we weren't able to obtain many colonies after the first attempt.
This could have been due to the presence of chemicals in the cuvettes after disinfection from previous attempts,
which could have degraded DNA. But even after thorough washing steps, we were able to only add 3 strains to our
initial library of 10 strains. This issue is something we continued to struggle with as we couldn't successfully
obtain variants of the larger natural RBS sequence that we randomized.
</p>
<p>
Despite these setbacks, we were able to generate an RBS library with sufficient variation for further metabolic
engineering needs. This also shows that apart from the large-scale hypothesis testing-based DBTL cycles that one
goes through in research, there are also the everyday decisions to contend with on how to run experiments and
troubleshoot when things don't go your way. Being able to produce results both in the wet lab and dry lab and in
the process learning how to circumvent/solve problems of daily research work is our engineering success.
</p>
<br>
<p><strong>MODEL TRAINING AND TESTING</strong></p>
<p><u>DBTL Iteration 1</u></p>
<p>We designed and trained commonly used regression-based machine learning models such as a Linear Regressor, a
Support Vector Machine Regressor, a k-Nearest Neighbour Regressor, and a Random Forest Regressor. We did not
tune their hyperparameters and just measured their base performance. On testing these models, we found that the
Random Forest performs best.
</p>
<p><u>DBTL Iteration 2</u></p>
<p>We then compared the Random Forest to other ensemble-based models that use "boosting" techniques, such as an
Adaptive Boosting Regressor and a Gradient Boosting Regressor. Suprisingly, the Random Forest still performed
better, despite the use of boosting in the other models. </p>
<p><u>DBTL Iteration 3</u></p>
<p>We finally fine-tuned the hyperparameters of the Random Forest Regressor by performing grid search
cross-validation. We found that the combination of parameters that work best were max_depth=80, max_features=3,
min_samples_leaf=3, min_samples_split=8, n_estimators=50 for random_state=23. We compared the performance of our
model against already existing ones, which showed very favourable results. This comparison is described in
detail under the 'Software' page.</p>
<p>Thus, through the Design-Build-Test-Learn methodology, we identified the best performing machine learning to
predict the relative expression of a gene given data about the chassis, temperature and the sequences of the RBS
and the coding region.</p>
<br>
<p><strong>OPTIMIZATION ALGORITHM</strong></p>
<p>We employed the same strategy to evaluate different optimization algorithms that can be used to modify the RBS
sequence to achieve a desired relative expression. We designed, implemented, and tested a Gradient Descent
optimizer, a Simulated Annealing optimizer, and a Genetic Algorithm optimizer. Our testing indicated that the
Genetic Algorithm yielded the most favorable results among the three models.</p>
<p>This strategy encompassed the following steps:</p>
<p><u>1. Design:</u> The formulation of each optimization model, such as Gradient Descent, Simulated Annealing, and
Genetic Algorithm, tailored to the specific requirements of the protein expression optimization task.
<p>
<u>2. Build:</u> The designed models were implemented into code, thus translating the theoretical constructs
into functional algorithms.
For Gradient Descent, we adjusted learning rates and convergence criteria, while for Simulated Annealing,
temperature schedules and acceptance criteria were carefully optimized. In the case of the Genetic Algorithm, we
fine-tuned parameters like population size, mutation rates, and selection strategies.
</p>
<p>
<u>3. Test:</u> We rigorously tested the optimization algorithms, considering time efficiency and correlation
with target expression levels. Simulations were conducted to compare their performance. The Genetic Algorithm
emerged as the optimal choice due to its efficient resource utilization, high correlation with target levels,
and consistent performance across various scenarios.
</p>
<p>
<u>4. Learn:</u> The results obtained from this Design-Build-Test process indicated that the Genetic Algorithm
produced the most favorable results among the three algorithms. This suggested to us that, in the context of
optimizing gene expression, the Genetic Algorithm demonstrated superior performance in reaching the desired
target, leading us to select it as the main optimization algorithm.
</p>
<p>This thorough and systematic approach allowed us to gain a comprehensive understanding of the strengths and
weaknesses of each algorithm in the context of protein expression optimization. It also highlighted the
adaptability and robustness of the Genetic Algorithm in achieving the desired protein expression targets.</p>
<p>Our optimization algorithm was put to a final test by generating different RBS sequences for GFP expression in <i>L.
lactis</i>. This were then implemented in the wet lab. The results of this testing reflected a good differentiation
between high and low expressing RBS variants. The results are described in detail under the 'Results' page.</p>
{% endblock %}