{% extends "layout.html" %} {% block title %}ENGINEERING{% endblock %} {% block lead %}OUR SUCCESS AND CHALLENGES WITH THE ENGINEERING CYCLE{% endblock %} {% block page_content %}
Goal: Our project’s goal was to effectively detect PFAS (specifically, PFOA) by utilizing e. coli for detection.
With this, we design 3 main pathways:
All part sequences are deposited into the registry and are publicly available. All genes were printed from Genscript and subcloned into pUC57-Kan or pUC57
After testing, there was no significant difference in fluorescence as the amount of PFOA added increased. This applies to every construct we tested (pRMA, FAB-GFP, STF 1/2). For a detailed look at results, please refer to our results page.
Since there was no significant difference in fluorescence as the amount of PFOA added increased, more testing is needed. Additionally, we needed to validate that the proteins were correctly expressed.
A Virtual Cell (VCell) is a useful tool to simulate the reactions of complex reaction pathways to compute the predicted concentrations of each species at any given point in time. It is capable of deterministically simulating reactions with differential equations and mass-action kinetics or stochastically by probabilistically simulating individual chemical reactions based on collisions. Our aim with Vcell this year was to computationally predict which gene construct would be the most effective in producing GFP in the presence of PFAS. We only simulated a single cell for time and simplicity, however, future works could expand on our work by simulating multiple cells and testing new pathways for PFAS detection.
In Cycle 1, we first had to build the structure for the future cycles. First, we had to research the constructs we had to design to find out how to start constructing them. We decided to first look at the pRMA_GFP construct and end with the pLex_GFP construct. We first started with the design of the constructs. Each construct had the same geometry.
Geometry:
Cell Size: 3.5 uM^3
Environment: 1000 uM^3
Cell size was bassed on Harvard Bionumbers
Environment size was determined by taking the reciprocal of the approximate max density of E. coli in batch culture, about 10^9 cells per milliliter of water (Tuttle et al., 2021), which is about 1000um^3.
Once a construct was designed we would input rate constants, and run stochastic simulations with the construct. In the simulations, we would use varying amounts of PFAS and see how much GFP was produced.
Simulation Parameters:
Time: 10,000 seconds
PFAS uM : 0.01, 0.1, 0, 1, 5, 10, 50
Solver: Gibson
Most rate constants were estimated and few were taken from the literature review. This is a problem we would like to address in future studies, ase construct would produce more SF_GFP. There is a direct relationship between PFAS concentrations and SF_GFP production. This construct helped us prepare for future models as they followed a similar structure. However, since most of the rates were estimated, we can not say for certain how accurate this model is. In future studies, we will look for more rate constants to have a higher accuracy in the output of this model.
We designed the reaction network building off last year reaction network PFAS Detector V2. We changed the reaction diagram so that when the pRMA_operon and the PFAS bonded and produced the complex the complex would produce the GFP. However, the GFP we used in our reaction diagram was a variant of GFP called SF_GFP(Super-folder GFP). Some reactions were taken from literature or pulled from the PFAS DetectorV2 Biomodel but others were estimated. This construct was the first one of the 4 constructs we diagramed and simulated on Vcell and this construct came as a learning guide for the creation of the future constructs.
Cell volume was chosen as 3.5 uM^3 while the Environment volume was set to 1000 uM^3 and time was measured in seconds.
Model name: pRMA_GFP construct 1 v1.3
Owner: Pillow123
pRMA_GFP Construct #1:
We ran the model with a variety of different starting conditions with stochastic solvers. We were able to visualize SuperFolder GFP or SF_GFP production in response to varying levels of PFAS.
The graph was made using the Google Sheets platform by extracting the data tables from VCell and pasting them onto sheets. We were then able to create a graph with multiple independent variables.
We can see from the data that having more micromolars of PFAS into the system the construct would produce more SF_GFP. There is a direct relationship between PFAS concentrations and SF_GFP production. This construct helped us prepare for future models as they followed a similar structure. However, since most of the rates were estimated, we can not say for certain how accurate this model is. In future studies, we will look for more rate constants to have a higher accuracy in the output of this model.
We designed the reaction network building on a paper we found on this topic. The FAB_GFP gene would be constantly produced by the pConst promoter and would bind with the PFAS to produce the complex. Similar to the construct before most rate constants were estimated however the binding affinity between the PFAS and the FAB_GFP protein was calculated and other rate constants were pulled from literature studies. This was the 2nd of the 4 constructs we had and it provided a base design for future constructs that we designed.
Cell volume was chosen as 3.5 uM^3 while the Environment volume was set to 1000 uM^3 and time was measured in seconds.
Model name: FAB_GFP_Construct v1.1
Owner: Pillow123
pRMA_GFP Construct #1:
The results for the PFAS_FAB_GFP were different than the pRMA_GFP construct but the style of the simulations was the same. All the same, constants were used (1000 seconds and the same solver). There were different rate constants overall showing differences in numbers but both of them provided convincing results.
The graph was also made using Google Sheets. The data was imported from Vcell as an HDF5 file and then put onto Google Sheets, where the graph was made. The graph used several ranges allowing for us to have several lines. The legend on the side helps improve the comprehensiveness of the graph and readability. Additionally with variety of colors can increase the readability of the graph as lines are now distinguished from one another.
We can see a direct relationship between the amount of micromolars of PFAS and the amount of PFAS_FAB_GFP being produced. The higher the micromolar count the higher count of PFAS_FAB_GFP. This is similar to the pRMA_GFP graph which shows a common relationship . This model helped become a model for future constructs and helped provide rate constants for future graphs.
This construct had a similar design to the FAB_GFP construct however the main difference is that when PFAS and the Synt Transcription Factor were binded they produced GFP. This model was where the FAB_GFP model helped a lot. Rate constants were estimated and similar to those of the FAB_GFP model but the binding affinity of PFAS and Synt_Tran factor was calculated. This was the 3rd of the 4 constructs with the last being pLex_GFP
Cell volume was chosen as 3.5 uM^3 while the Environment volume was set to 1000 uM^3 and time was measured in seconds.
Model name: Synthetic_Transcription_factorv1.1
Owner: Pillow123
SyntTranscriptionFactor_GFP Construct #3:
We ran stochastic simulations on the reaction diagram with varying micromolars of PFAS and ran to see its effect on the GFP protein. Different rate constants were used which caused variance in the amount of GFP produced between the constructs.
Unlike the other graphs, we can see in all the uM values of PFAS that there were near 0 for the first 100 seconds. Then they all separated around 1250 seconds.
We can see like all other graphs that this one had a direct relationship. The more PFAS in the system the more GFP was produced. There was no promoter leak so PFAS 0uM stayed at 0uM which makes logical sense. A future study on this construct would be to see if making it a dual construct (constructs 3 & 4) would optimize the detection of PFAS.
We designed the reaction network building off of last year’s reaction network PFAS Detector V2 similar to the pRMA_GFP construct. We simply replaced the pRMA_operon with the pLex_operon and changed the GFP variant. In this construct, unlike the 1st construct, the GFP variant will simply be GFP, unlike the SF_GFP variant used in the pRMA_GFP construct. Also like the pRMA_operon the pLex_operon has a leak which we estimated using scientific studies.The pLex_operon would bind with the PFAS to create the complex which would then produce the GFP_mRNA.
Cell volume was chosen as 3.5 uM^3 while the Environment volume was set to 1000 uM^3 and time was measured in seconds.
Model name: pLex_GFP Construct
Owner: Pillow123
pLex_GFP Construct #4:
We ran stochastic simulations on the reaction diagram with varying micromolars of PFAS and ran to see its effect on the GFP protein. Different rate constants were used which caused variance in the amount of GFP produced between the constructs.
We can see that each PFAS uM value has a varying amount at the 2000-second mark but all start to curve from there too. The 2000-second mark represents their “max”.
With this last construct, we can analyze all the graphs and come to a conclusion on which pathway was the most successful in detecting PFAS. However, additionally, cycles may be necessary as the rate constants were heavily estimated and few were based on scientific research. However, looking at the graph we can see that the 0.01uM graph produced the least while the 50 uM produced the most showing a direct relationship between the uM of PFAS and the GFP count. Additionally, due to a promoter leak, the 0uM produced more than the 0.1uM and 0.01uM. This was also the only graph that produced a true stochastic curve, while the others produced exponential graphs.
Further research shows that a combination of constructs 3 & 4 enhances the production of GFP in the presence of PFAS. Due to an error in the creation of constructs 3 and 4 which were supposed to work together to produce the GFP we had to recreate construct 3 which is a combination of the 2 constructs. When PFAS is introduced into the system and binds with the Synthetic Transcription factor it allows it to bind to the pLex promoter which induces transcription. Using this information a new model was developed using previous rate constants from the models and new simulations were run.
Cell volume was chosen as 3.5 uM^3 while the Environment volume was set to 1000 uM^3 and time was measured in seconds.
Model name: Construct 3_Cycle 2
Owner: Aryanshah16
SyntTranscriptionFactor Construct:
The previous construct 3 and 4 were combined into one Biomodel.
We ran stochastic simulations on the reaction diagram with varying micromolars of PFAS and ran to see its effect on the GFP protein production. Different rate constants were used which caused variance in the amount of GFP produced between the constructs.
The graph was made using the Google Sheets platform by extracting the data tables from VCell and pasting them onto sheets. We were then able to create a graph with multiple independent variables.
Going through the rate constant, where PFOA binds to the synthetic transcription factor, it directly correlates with the amount of GFP produced. Multiplying the rate constant by ½ would result in a 50% decrease in GFP production, etc. GFP was produced at similar amounts based on each volume of PFAS.
Further research shows that a combination of constructs 3 & 4 enhances the production of GFP in the presence of PFAS. Due to an error in the creation of constructs 3 and 4 which were supposed to work together to produce the GFP we had to recreate construct 3 which is a combination of the 2 constructs. When PFAS is introduced into the system and binds with the Synthetic Transcription factor it allows it to bind to the pLex promoter which induces transcription. Using this information a new model was developed using previous rate constants from the models and new simulations were run.
Cell volume was chosen as 3.5 uM^3 while the Environment volume was set to 1000 uM^3 and time was measured in seconds.
Model name: Construct 3_Cycle 2
Owner: Aryanshah16
SyntTranscriptionFactor Construct:
The previous construct 3 and 4 were combined into one Biomodel.
We ran stochastic simulations on the reaction diagram with varying micromolars of PFAS and ran to see its effect on the GFP protein production. Different rate constants were used which caused variance in the amount of GFP produced between the constructs.
The graph was made using the Google Sheets platform by extracting the data tables from VCell and pasting them onto sheets. We were then able to create a graph with multiple independent variables.
Going through the rate constant, where PFOA binds to the synthetic transcription factor, it directly correlates with the amount of GFP produced. Multiplying the rate constant by ½ would result in a 50% decrease in GFP production, etc. GFP was produced at similar amounts based on each volume of PFAS.
In the previous iterations, there were rate constants that were fully estimated without sources. In this cycle, a new source was added to each rate constant that did not have one previously. All sources can be found within our publicly available VCell Biomodels. PFAS testing ranges were recomputed. Regulations are on the parts per trillion (ppt) level, so we used 1 ppt as our minimum concentration. Assuming 1 ppt=1 ng/ml and the molar weight of PFOA, a common model PFAS, is 414 g/mol, that means 1ppt is approximately 2E-6 uM of PFOA. Additionally, we realized stationary phase E. coli are smaller than 3.5 um^3 (Azam et al., 1999) and are closer to 1 um^3. Since real life application of our biosensor would likely use cells that are already at steady state equilibrium, we ran simulations where protein and mRNA is at steady state equilibrium before PFAS is introduced. Steady state concentrations were determined using deterministic models that ran until concentrations ceased fluctuations. For the synthetic transcription factor Biomodel, a new degradation pathway for synthetic transcription factor molecules bound to PFOA to be degraded was introduced. For the FAB-GFP model, PFAS is made to be regenerated when the PFAS.FAB-GFP protein is degraded.
Model name: Construct3_Cycle3_dgl Model owner: dglVcell
Model name: FAB_GFP_Construct_v1.1_Douglas Model owner: dglVcell
Simulations were run for 100 minutes (6000 seconds). Every plot shown is stochastic using the Gibson solver.
The improved models showed very intriguing results. A PFAS-inducible transcription factor with a relatively strong binding affinity appears to produce the greatest amount of signal in response to a very small amount of PFAS. Even when there is only about a single PFAS molecule per cell (at 10^-6 uM), the synthetic transcription factor system can produce a large amount of GFP (about 5000 molecules per cell). The FAB-GFP system appears to be limited by the amount of PFAS that can bind to the GFP. At extremely low concentrations of PFAS, like those set as the minimal safety limits, there may be only one bound FAB-GFP producing fluorescence in a cell. For both constructs, it appears that the cell actually depletes the amount of PFAS that is available externally due to strong binding affinities. It is important to note that while the binding affinity of FAB-GFP has been experimentally characterized by the original creators, the binding affinity of the synthetic transcription factor was set to be the empirical binding affinity of human estrogen receptor alpha to PFOA, which may not accurately reflect the actual binding affinity of PFOA or other PFAS to the entire synthetic transcription factor.
At an earlier stage in the project, we attempted to use AI to design proteins that could be used to bind to PFAS, acting as a biosensor. While we were not able to incorporate a fully functional model into the final project, we have documented our efforts and progress so that the work could be expanded on.
The Umbrella Sampling technique used in molecular dynamics is especially useful in this project as it manually enhances the sampling of complex energy landscapes of protein data banks, involving free energy calculations and calculations of other thermodynamic variables. It can overcome large energy barries and simulate rare events that other rudimentary simulations would not be able to achieve.
The program was designed as a pipeline between various molecular dynamics tools. However, the most important part of the pipeline is the Umbrella Sampling process indicated by the caption on the arrow between the windows rectangle and metafile/cv files rectangle.
Below is the pseudocode outlining the program above including the SMD pulling loop, Umbrella Sampling, wham analysis, and binding coefficient algorithm. This paraphrases the code itno simple words, and the full code in python is in the software gitlab repository.
BEGIN
# Setup Directories
DEFINE master_path as current directory
DEFINE test_path as subdirectory "tests"
DEFINE forcefield_path, pdb_path
IF directories do not exist
CREATE necessary directories (e.g., pdb, windows, cv, hist)
# Test Folder Naming
SET test_name = nextFolder()
DEFINE new_path as directory for current test
IF new_path does not exist
CREATE directory structure for new test (cv, windows, hist)
PRINT "New folder created"
# Initialize System
LOAD PDB file from pdb_path
CREATE Modeller to check and add missing hydrogens
APPLY forcefield to the system
# Define System and Simulation
SET system with forcefield, constraints, and parameters
INITIALIZE Langevin integrator for temperature and timestep
SET positions and velocities for the system
MINIMIZE system energy
# Parameters for Steered Molecular Dynamics (SMD)
DEFINE starting and ending values for collective variable (CV)
DEFINE pulling force constant, velocity, and number of windows
DEFINE steps and recording intervals for simulation
# Add pulling force to system
CREATE pullingForce as harmonic force on CV
ADD pullingForce to the system
INITIALIZE simulation with pulling force
# Perform Steered Molecular Dynamics (SMD) Loop
FOR each window
PERFORM simulation steps
UPDATE pulling force parameter r0
RECORD collective variable (CV) values
IF CV reaches target value for the current window
SAVE system coordinates
END IF
UPDATE progress bar
END FOR
# Save Windows and CV Data
FOR each saved window
SAVE window coordinates as PDB file
END FOR
# Run Simulations on Each Window
DEFINE function run_window(window_index)
LOAD coordinates from corresponding window
SET system positions to window coordinates
RESET temperature and run simulation for steps
RECORD collective variable (CV) values for each window
SAVE CV values to file
END FUNCTION
CALL run_window() for each window
# Generate Histogram and Save
FOR each window
LOAD CV data
GENERATE histogram of CV values
SAVE histogram plot
CREATE metafile entry for WHAM analysis
END FOR
# WHAM Analysis
PREPARE WHAM input arguments
RUN WHAM on metafile to compute PMF
SAVE WHAM output and log
# Post-WHAM Processing
LOAD PMF data from WHAM output
PARSE coordinates, free energies, and probabilities
PLOT free energy and probability graphs
SAVE graph as PNG file
# Compute Equilibrium Constant (Kd)
COMPUTE delta_G_bound and delta_G_unbound from PMF
CALCULATE standard free energy difference (ΔG)
COMPUTE dissociation constant (Kd) from ΔG
OUTPUT Kd value
END
This program was tested with the PDB that had protonated PFOA in FAB-GFP. Although deprotonated PFOA could be simulated in antechamber, we were unable to generated a forcefield file for the program to use to simulate deprotonated PFOA. Thus we had to use protonated pfoa where the forcefield was able to generate with a program called LigParGen, named and stored as:
umbrellasampling/tests/forcefields/pfoa_protonated.xml
The setup of the test includes the following paraemters:
Li | Lf | index1 | index2 | num_win | fc_pull | v_pulling | total_steps | increment_steps | wTotal | wDelta |
---|---|---|---|---|---|---|---|---|---|---|
3.0 nm | 3.2 nm | 14 (Hydrogen UNL 1) | 5439 (Hydrogen SER 349) | 100 | 1000.0 kj/mol/nm^2 | 0.02 nm/pc | 30,000 | 10 | 100,000 | 10 |
The results of the PMF graph and histogram graph are as shown below.
The Kd and delta G calculated from the PMF graph is:
Dissociation Constant | ΔG° |
---|---|
2.26 M | 0.48224 kcal/mol |
Although we were unable to simulate deprotonated PFOA with the program, the major takeaways with the test was the stark differences in the behaviors of protonated and deprotonated PFOA.
1. When the energy was minimized, the PFOA was put outside the binding pocket of FAB-GFP, thus the very high dissociation constant, or low binding strength, was explained by this result.
2. The deprotonated PFOA had a much lower dissociation constant, thus much higher bonding strength, explained by being inside the binding pocket of FAB-GFP.