@@ -23,3 +23,17 @@ The workflow of the project is devided into **four steps**: Data Retrieval and P
2.`biobricks_data_summary/`: Simplify and summarize the downloaded data in Step 1 using Llama2. See [biobricks_data_summary/readme.md](biobricks_data_summary/readme.md) for detailed information.
3.`data/`: Summaries from Step 2 is processed and organized into a standard form ready for model input. The data we collected is stored in the directory `data/biobricks`. See [data/readme.md](data/readme.md) for detailed information.
4.`mono/`: The BERT model is trained based on the BioBrick Summary data and the pre-trained model. A web interfate is also designed for users to try. See [mono/readme.md](mono/readme.md) for more information.
## Result
the test result of our reverse dictionary model for biobricks is as follows:
| test data | top1 hit rate | top10 hit rate | top100 hit rate |
top10 hit rate means the probability of the biobricks you want appears in the top ten items on the webpage.Others are similar.
To futher validate our model in a convincing way, we constructed a test set whichc is generated by llama2 to test the model, and asked the teammates of wetlab to evaluate the output, they thought the output matches well. What's more, we let wetlab teammates make several queries combined with their actual needs during their experiment. For most cases, they could find suitable biobricks in the first ten results.