Skip to content
Snippets Groups Projects
Commit 5517adcc authored by zengjianshu's avatar zengjianshu
Browse files

improve software2023/README.md

parent 23e47481
No related branches found
No related tags found
No related merge requests found
......@@ -23,3 +23,17 @@ The workflow of the project is devided into **four steps**: Data Retrieval and P
2. `biobricks_data_summary/`: Simplify and summarize the downloaded data in Step 1 using Llama2. See [biobricks_data_summary/readme.md](biobricks_data_summary/readme.md) for detailed information.
3. `data/`: Summaries from Step 2 is processed and organized into a standard form ready for model input. The data we collected is stored in the directory `data/biobricks`. See [data/readme.md](data/readme.md) for detailed information.
4. `mono/`: The BERT model is trained based on the BioBrick Summary data and the pre-trained model. A web interfate is also designed for users to try. See [mono/readme.md](mono/readme.md) for more information.
## Result
the test result of our reverse dictionary model for biobricks is as follows:
| test data | top1 hit rate | top10 hit rate | top100 hit rate |
| ----------- | ------------- | -------------- | --------------- |
| seen | 0.992 | 1.0 | 1.0 |
| unseen | 0.39 | 0.7 | 0.856 |
| seen+unseen | 0.691 | 0.85 | 0.928 |
top10 hit rate means the probability of the biobricks you want appears in the top ten items on the webpage.Others are similar.
To futher validate our model in a convincing way, we constructed a test set whichc is generated by llama2 to test the model, and asked the teammates of wetlab to evaluate the output, they thought the output matches well. What's more, we let wetlab teammates make several queries combined with their actual needs during their experiment. For most cases, they could find suitable biobricks in the first ten results.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment