Skip to content
Snippets Groups Projects
engineering.html 7.28 KiB
Newer Older
Fabio Maschi's avatar
Fabio Maschi committed
{% extends "layout.html" %}
Sishir Sivakumar's avatar
Sishir Sivakumar committed

Fabio Maschi's avatar
Fabio Maschi committed
{% block title %}Engineering Success{% endblock %}
Sishir Sivakumar's avatar
Sishir Sivakumar committed
{% block lead %}Demonstrate engineering success in a part of your project by going through at least one iteration of the
engineering design cycle.{% endblock %}
Fabio Maschi's avatar
Fabio Maschi committed

{% block page_content %}

Sishir Sivakumar's avatar
Sishir Sivakumar committed
<br>
<div class="model-heading">
    <h1 id="engineering-heading">
        Engineering Success
    </h1>
</div>
<div class="Content" style="max-height: 100%;">
Sishir Sivakumar's avatar
Sishir Sivakumar committed
    <p><strong>WET LAB</strong></p>
    <p>
        In the lab, designing each minor experiment itself is an experience that takes one through the DBTL cycle. As
        outlined in our Lab Notebook, even staple molecular biology techniques like PCR and electroporation can prove to
Sishir Sivakumar's avatar
Sishir Sivakumar committed
        be challenging when approaching it for the first time with new organisms and parts to test.
    </p>
    <p>
        The first wall we hit in our experimental journey was the amplification of our ordered sequences. After
        thoroughly checking the primers and matching them with the plasmid we were using, we were dejected to get our
        amplified products. Our first thought was to reduce the annealing temperature since the temperature calculated
        from the sequence information could have a slight deviation. After multiple iterations of this, we were able to
        get satisfactory results only by implementing touchdown PCR with a range of annealing temperatures. Finally, we
        were able to obtain all the amplified products by reducing our reaction volume which would have aided in better
        conduction of heat during the process.
    </p>
    <p>
        Our next hurdle was the electroporation. Electroporation is shown to be the standard approach in transforming
        our recombinant DNA into <i>L. lactis</i>. However, we weren't able to obtain many colonies after the first attempt.
        This could have been due to the presence of chemicals in the cuvettes after disinfection from previous attempts,
        which could have degraded DNA. But even after thorough washing steps, we were able to only add 3 strains to our
        initial library of 10 strains. This issue is something we continued to struggle with as we couldn't successfully
        obtain variants of the larger natural RBS sequence that we randomized.
    </p>
    <p>
        Despite these setbacks, we were able to generate an RBS library with sufficient variation for further metabolic
        engineering needs. This also shows that apart from the large-scale hypothesis testing-based DBTL cycles that one
        goes through in research, there are also the everyday decisions to contend with on how to run experiments and
        troubleshoot when things don't go your way. Being able to produce results both in the wet lab and dry lab and in
        the process learning how to circumvent/solve problems of daily research work is our engineering success.
    </p>
    <br>
    <p><strong>MODEL TRAINING AND TESTING</strong></p>
    <p><u>DBTL Iteration 1</u></p>
    <p>We designed and trained commonly used regression-based machine learning models such as a Linear Regressor, a
        Support Vector Machine Regressor, a k-Nearest Neighbour Regressor, and a Random Forest Regressor. We did not
        tune their hyperparameters and just measured their base performance. On testing these models, we found that the
        Random Forest performs best.
    </p>
    <p><u>DBTL Iteration 2</u></p>
    <p>We then compared the Random Forest to other ensemble-based models that use "boosting" techniques, such as an
        Adaptive Boosting Regressor and a Gradient Boosting Regressor. Suprisingly, the Random Forest still performed
        better, despite the use of boosting in the other models. </p>
    <p><u>DBTL Iteration 3</u></p>
    <p>We finally fine-tuned the hyperparameters of the Random Forest Regressor by performing grid search
        cross-validation. We found that the combination of parameters that work best were max_depth=80, max_features=3,
        min_samples_leaf=3, min_samples_split=8, n_estimators=50 for random_state=23. We compared the performance of our
        model against already existing ones, which showed very favourable results. This comparison is described in
        detail under the 'Software' page.</p>
    <p>Thus, through the Design-Build-Test-Learn methodology, we identified the best performing machine learning to
        predict the relative expression of a gene given data about the chassis, temperature and the sequences of the RBS
        and the coding region.</p>
    <br>
    <p><strong>OPTIMIZATION ALGORITHM</strong></p>
    <p>We employed the same strategy to evaluate different optimization algorithms that can be used to modify the RBS
        sequence to achieve a desired relative expression. We designed, implemented, and tested a Gradient Descent
        optimizer, a Simulated Annealing optimizer, and a Genetic Algorithm optimizer. Our testing indicated that the
        Genetic Algorithm yielded the most favorable results among the three models.</p>
    <p>This strategy encompassed the following steps:</p>
    <p><u>1. Design:</u> The formulation of each optimization model, such as Gradient Descent, Simulated Annealing, and
        Genetic Algorithm, tailored to the specific requirements of the protein expression optimization task.
    <p>
        <u>2. Build:</u> The designed models were implemented into code, thus translating the theoretical constructs
        into functional algorithms.
        For Gradient Descent, we adjusted learning rates and convergence criteria, while for Simulated Annealing,
        temperature schedules and acceptance criteria were carefully optimized. In the case of the Genetic Algorithm, we
        fine-tuned parameters like population size, mutation rates, and selection strategies.
    </p>
    <p>
        <u>3. Test:</u> We rigorously tested the optimization algorithms, considering time efficiency and correlation
        with target expression levels. Simulations were conducted to compare their performance. The Genetic Algorithm
        emerged as the optimal choice due to its efficient resource utilization, high correlation with target levels,
        and consistent performance across various scenarios.
    </p>
    <p>
        <u>4. Learn:</u> The results obtained from this Design-Build-Test process indicated that the Genetic Algorithm
        produced the most favorable results among the three algorithms. This suggested to us that, in the context of
        optimizing gene expression, the Genetic Algorithm demonstrated superior performance in reaching the desired
        target, leading us to select it as the main optimization algorithm.
    </p>
    <p>This thorough and systematic approach allowed us to gain a comprehensive understanding of the strengths and
        weaknesses of each algorithm in the context of protein expression optimization. It also highlighted the
        adaptability and robustness of the Genetic Algorithm in achieving the desired protein expression targets.</p>
    <p>Our optimization algorithm was put to a final test by generating different RBS sequences for GFP expression in <i>L.
        lactis</i>. This were then implemented in the wet lab. The results of this testing reflected a good differentiation
Sishir Sivakumar's avatar
Sishir Sivakumar committed
        between high and low expressing RBS variants. The results are described in detail under the 'Results' page.</p>
Sishir Sivakumar's avatar
Sishir Sivakumar committed
</div>
Fabio Maschi's avatar
Fabio Maschi committed