Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • 2024/software-tools/evry-paris-saclay
1 result
Show changes
Commits on Source (47)
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.replit
data/*
results/*
PocketGen/*
checkpoints/*
poetry.lock
pyproject.toml
replit.nix
PocketGen
pocketgen
pocketgen/*
# temporary
benchmark.py
old_model.py
\ No newline at end of file
# Team Evry-Paris-Saclay 2024 Software Tool
If your team competes in the [**Software & AI** village](https://competition.igem.org/participation/villages) or wants to
apply for the [**Best Software Tool** prize](https://competition.igem.org/judging/awards), you **MUST** host all the
code of your team's software tool in this repository, `main` branch. By the **Wiki Freeze**, a
[release](https://docs.gitlab.com/ee/user/project/releases/) will be automatically created as the judging artifact of
this software tool. You will be able to keep working on your software after the Grand Jamboree.
> If your team does not have any software tool, you can totally ignore this repository. If left unchanged, this
repository will be automatically deleted by the end of the season.
## Description
Let people know what your project can do specifically. Provide context and add a link to any reference visitors might
be unfamiliar with (for example your team wiki). A list of Features or a Background subsection can also be added here.
If there are alternatives to your project, this is a good place to list differentiating factors.
## Installation
Within a particular ecosystem, there may be a common way of installing things, such as using Yarn, NuGet, or Homebrew.
However, consider the possibility that whoever is reading your README is a novice and would like more guidance. Listing
specific steps helps remove ambiguity and gets people to using your project as quickly as possible. If it only runs in a
specific context like a particular programming language version or operating system or has dependencies that have to be
installed manually, also add a Requirements subsection.
## Usage
Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of
usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably
include in the README.
## Contributing
State if you are open to contributions and what your requirements are for accepting them.
For people who want to make changes to your project, it's helpful to have some documentation on how to get started.
Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps
explicit. These instructions could also be useful to your future self.
You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce
the likelihood that the changes inadvertently break something. Having instructions for running tests is especially
helpful if it requires external setup, such as starting a Selenium server for testing in a browser.
## Authors and acknowledgment
Show your appreciation to those who have contributed to the project.
# 🚀 Flint : AI-powered protein receptor mutation
A computational tool designed to generate protein receptor mutants that allows enhanced binding specificity towards a target ligand, based on [PocketGen]. While generating a set of mutated receptor structures, Flint evaluates their affinity with the ligand using [AutoDock Vina]. By default, our custom `__MODELNAME__` pre-trained model checkpoint is used for the generation. The key point is that the docking simulation was embedded as the scoring function during the learning transfer, making it the target of the gradient descent (see [Model](https://2024.igem.wiki/evry-paris-saclay/model) article on the team's wiki).
## Outputs and expected results
If executed correctly, Flint will generate a set of unique mutated receptor proteins in PDB format, designed to maximize the binding affinity for the ligand ; and a summary file containing, for each receptor :
- The corresponding docking score, affinity constant and rank compared to other mutants.
- Additional information about the receptor-ligand interaction (e.g. ligand position, residues involved).
- The sequence of mutations that leads to its creation, starting from the original receptor.
## Getting started with Flint
```bash
git clone https://github.com/Phagevo/Flint.git
cd Flint
git clone https://github.com/Phagevo/PocketGen.git
```
Install the environment and dependencies using [conda]'s config file
```bash
conda env create -f env.yaml
conda activate flint
```
If you intend to build environment without conda, keep in mind that installing [AutoDock Vina] from `pip` or any other package manager is deprecated. Besides, to run the project from Windows, [this question](https://stackoverflow.com/questions/71865073/unable-to-install-autodock-vina-potentially-due-to-boost) on stackoverflow might be helpful.
```yaml
├── pocketgen # can be cloned from PocketGen repository
├── checkpoints # folder needs to be created manually
│ ├── __MODELNAME__.pt
│ └── pocketgen.pt
├── eval
├── model
└── main.py
```
This (above) is what should ressemble your working directory after installing Flint.
## Usage from command line
```bash
python main.py --receptor <receptor.pdb> --ligand <ligand.sdf> --output <output_directory>
```
- `<receptor.pdb>`: Path to the input protein receptor file in PDB format.
- `<ligand.sdf>`: Path to the input ligand file in SDF format.
- `<output_directory>`: Directory where the output mutant structures and scores will be saved.
[AutoDock Vina]: https://github.com/ccsb-scripps/AutoDock-Vina
[PocketGen]: https://github.com/zaixizhang/PocketGen
\ No newline at end of file
name: flint
channels:
- defaults
- conda-forge
- pytorch
- pyg
dependencies:
- python=3.11
- pytorch
- vina
- fair-esm
- rdkit
- openmm
- pdbfixer
- openbabel
- meeko
- easydict
- lmdb
- python-lmdb
- pytorch_geometric
- pyyaml
- omegaconf
- biopython
- pip
- pytorch_scatter
- pytorch_cluster
- pip:
- git+https://github.com/Valdes-Tresanco-MS/AutoDockTools_py3
prefix: /home/paradoxe/anaconda3/envs/phagevo
import math
def kd(delta_G:float, temperature:float=298.0) -> float:
"""
calculates the affinity constant depending on the binding free energy
@param delta_G (float): the value of the binding free energy (kcal/mol)
@param temperature (float): temperature (kelvin)
@return (float): affinity constant
"""
R = 0.001987 # gaz constant
return math.exp((-delta_G) / (R * temperature))
\ No newline at end of file
import os
from vina import vina
import time
def docking(
receptor_file:str,
ligand_file:str,
center:"tuple[float, float, float]"=(0,0,0),
box_size:"tuple[float, float, float]"=(20,20,20),
n_dockings:int=64,
n_poses:int=32,
score=False,
write=False) -> "list[float] | float":
"""
Docking simulation function : returns ...
@param receptor_file: protein (pdbqt file)
@param ligand_file: ligand (pdbqt file)
@param center: docking window center
@param box_size: docking window size
@param n_dockings: number of docking simulations
@param n_poses: number of pose attempts per simulation
@param score (bool): wether or not the output should be a single score
@param write (bool): wether or not the poses should be saved to a file
@return (list[float] | float): a list of scores or a single score
"""
receptor_name = os.path.splitext(receptor_file)[-1].split('.')[0]
ligand_name = os.path.splitext(ligand_file)[-1].split('.')[0]
# initialises vina
v = Vina(sf_name='vina', verbosity=1)
v.set_receptor(receptor_file)
v.set_ligand_from_file(ligand_file)
# set the docking frame
v.compute_vina_maps(center=center,box_size=box_size)
# scores the current pose
energy = v.score()
print('Score before minimization: %.3f (kcal/mol)' % energy[0])
# minimizes locally the current pose
energy_minimized = v.optimize()
print('Score after minimization : %.3f (kcal/mol)' % energy_minimized[0])
# v.write_pose(f'{ligand_name}_minimized.pdbqt', overwrite=True)
# docks the ligand
v.dock(exhaustiveness=n_dockings, n_poses=20)
if write:
v.write_poses(
f'results/docked/{ligand_name}_docked_{time.time()}.pdbqt',
n_poses=n_poses)
output = None
# single hi-score or the list of all poses energies
if not score:
results = v.energies(n_poses=n_poses)
output = [energies[0] for energies in results]
else:
output = v.energies(n_poses=1)[0][0]
return output
from Bio import PDB
def get_sequence(structure):
"""
Loads a protein structure and returns the sequence of amino acids.
@param structure (str): the protein structure.
@return (sequence): the aa sequence of the protein.
"""
sequence = []
for model in structure:
for chain in model:
for residue in chain:
if PDB.is_aa(residue):
sequence.append(residue.resname)
return sequence
def mutations(protein1_path, protein2_path):
"""
Loads two protein paths and returns the number of mutations between them.
@param protein1_path (str): the first protein path.
@param protein2_path (str): the second protein path.
@return (int): the number of mutations between the two proteins.
"""
parser = PDB.PDBParser()
structure1 = parser.get_structure('protein1', protein1_path)
structure2 = parser.get_structure('protein2', protein2_path)
seq1 = get_sequence(structure1)
seq2 = get_sequence(structure2)
mutations = []
for i, (res1, res2) in enumerate(zip(seq1, seq2)):
if res1 != res2:
mutations.append((i + 1, res1, res2))
return len(mutations)
import os
import subprocess
def prepare(file_path:str) -> str:
"""
Convert a PDB file to PDBQT format using Open Babel.
@param file_path: path to the input PDB file
@return: path to the output PDBQT file
"""
# defines the output file name
pdbqt_file = os.path.splitext(file_path)[0] + '.pdbqt'
pdbqt_file = pdbqt_file.replace('raw', 'preped')
# converts PDB to PDBQT using Open Babel
flags = "-xc -xr" if file_path.endswith("pdb") else ""
command = f'obabel {file_path} -opdbqt -O {pdbqt_file} -h {flags}'
command += "--partialcharge gasteiger" # includes forces and charges
subprocess.run(command, shell=True)
return pdbqt_file
from rdkit import Chem
from Bio.PDB import PDBParser
import numpy as np
def compute_box(
receptor_path:str,
ligand_path:str,
cutoff:float=5.0,
padding:float=5.0) -> "dict[str, tuple[float, float, float]]":
"""
calculates the dimensions and center of the docking box
@param receptor_path: path to the receptor file (.pdb)
@param ligand_path: path to the ligand file (.sdf)
@param cutoff: capture distance for neighbour atoms (angstrom)
@param padding: padding around the box to ensure the ligand is inside (angstrom)
@return: center coordinates (x, y, z) and sizes (x, y, z) of the box
"""
ligand = Chem.SDMolSupplier(ligand_path)[0]
ligand_coords = np.array([list(ligand.GetConformer().GetAtomPosition(i))
for i in range(ligand.GetNumAtoms())])
structure = PDBParser(QUIET=True).get_structure('receptor', receptor_path)
atoms = list(structure.get_atoms()) # get all atoms in receptor
# compute the geometric center of the ligand (center of mass)
ligand_center = np.mean(ligand_coords, axis=0).astype(float)
# collect atoms close to the ligand
site_atoms = np.array([atom.coord
for atom in atoms
if np.linalg.norm(atom.coord - ligand_center) <= cutoff]).astype(float)
if site_atoms.size == 0:
site_atoms = ligand_coords
# compute min/max coordinates for the docking box
x_min, y_min, z_min = np.min(site_atoms, axis=0)
x_max, y_max, z_max = np.max(site_atoms, axis=0)
return {
"center": (
(x_min + x_max) / 2,
(y_min + y_max) / 2,
(z_min + z_max) / 2
),
"size": (
(x_max - x_min) + 2 * padding,
(y_max - y_min) + 2 * padding,
(z_max - z_min) + 2 * padding
)
}
\ No newline at end of file
from model.Model import Model
import argparse
import torch
# if called from command line
if __name__ == "__main__":
torch.set_warn_always(False)
parser = argparse.ArgumentParser()
parser.add_argument("-d", "--device", type=str, default="cuda:0", help="Set the device (cpu or cuda:0)")
parser.add_argument("-o", "--output", type=str, default="./results", help="Set the path for the output directory")
parser.add_argument("-v", "--verbose", type=int, choices=[0, 1, 2], default=1, help="Set the verbosity between 0 and 2")
parser.add_argument("--receptor", type=str, required=True, help="Set the receptor filepath")
parser.add_argument("--ligand", type=str, required=True, help="Set the ligand filepath")
parser.add_argument("-n", "--number", type=int, default=8, help="Chose the number of generated mutants")
# parse arguments
args = parser.parse_args()
# instantiates the model with args
flint = Model("./checkpoints/checkpoint.pt", {
"device": args.device,
"output": args.output,
"verbose": args.verbose,
"number": args.number
})
# pass molecule files to the model
flint.input(args.receptor, args.ligand)
# begin the inference / generate mutants
# flint.generate()
# output the results and write the summary file
flint.results()
\ No newline at end of file
import esm
import torch
import os
import shutil
from torch.utils.data import DataLoader
from functools import partial
import numpy as np
from PocketGen.models.PD import Pocket_Design_new
from PocketGen.utils.misc import seed_all, load_config
from PocketGen.utils.transforms import FeaturizeProteinAtom, FeaturizeLigandAtom
from PocketGen.utils.data import collate_mols_block
from .sampler import interaction
from eval.docking import docking
from eval.prepare import prepare
from eval.window import compute_box
from eval.mutations import mutations
from eval.chemutils import kd
class Model:
def __init__(self, checkpoint_path:str, args):
"""
The mutant generation model constructor. This method does the setup of
torch and CUDA environment, loads the checkpoint and then returns a PocketGen
instance using the weights from checkpoints and the parameters retrieved.
@param checkpoint_path (str): Path to checkpoint (.pt) file for PocketGen.
@param verbose (int): 0 for quiet, 1 for necessary information and 2 for debug.
"""
# setup global class variables
self.verbose = args["verbose"]
self.device = args["device"]
self.outputdir = args["output"]
self.size = args["number"]
self.sources = []
self.config = load_config('./PocketGen/configs/train_model.yml')
if self.verbose > 0:
print('Flint setup started, please wait.')
if self.verbose == 2:
print('Now initializing pytorch and CUDA environment :')
# cleans cache and sets the libs seeds
torch.cuda.empty_cache()
seed_all(2089)
if self.verbose == 2:
print('\tpytorch and CUDA initialized correctly.')
print('Now retrieving alphabet from fair-ESM :')
# sets ESM2 alphabet as the usual alphabet
pretrained_model, self.alphabet = esm.pretrained.load_model_and_alphabet_hub('esm2_t33_650M_UR50D')
del pretrained_model # ESM2 pretrained_model that we don't need here is deleted from memory
if self.verbose == 2:
print('\tESM alphabet successfully loaded.')
print('Now building PocketGen model :')
# get the model checkpoint from .pt file
self.checkpoint = torch.load(checkpoint_path, map_location=self.device)
if self.verbose == 2:
print('\tcheckpoint successfully created.')
# instanciate PocketGen model for pocket design
self.model = Pocket_Design_new(
self.config.model,
protein_atom_feature_dim=FeaturizeProteinAtom().feature_dim,
ligand_atom_feature_dim=FeaturizeLigandAtom().feature_dim,
device=self.device
)
if self.verbose == 2:
print("\tPocketGen model well instanciated.")
# send model to selected device
self.model = self.model.to(self.device)
if self.verbose == 2:
print('\tPocketGen model sent to selected device.')
# load current saved checkpoint into model
self.model.load_state_dict(self.checkpoint['model'])
if self.verbose == 2:
print('\tcheckpoint loaded into PocketGen.')
print('End of setup, model can now be used.\n\n')
def input(self, receptor_path:str, ligand_path:str) -> "Model":
"""
Loads a protein receptor and a ligand from files and store it in
a data-loader, useable by the model when generating mutants.
@param ligand_path (str): path to the ligand SDF file.
@param receptor_path (str): path to the receptor PDB file.
@return (Model): the instance of Model, for chainability purposes.
"""
if self.verbose == 2:
print('Now parsing data from receptor and ligand :')
# get dense features from receptor-ligand interaction
features = interaction(receptor_path, ligand_path)
if self.verbose == 2:
print('\tsuccessfully parsed interaction features.\n')
print('Now building the pytorch dataloader :')
# initialize the data loader (including batch converter)
self.loader = DataLoader(
[features for _ in range(self.size)],
batch_size=1,
shuffle=False,
num_workers=self.config.train.num_workers,
collate_fn=partial(
collate_mols_block,
batch_converter=self.alphabet.get_batch_converter()
)
)
# stores the source input files to compare
self.sources = [receptor_path, ligand_path]
if self.verbose == 2:
print('\tpytorch dataloader built correctly.')
return self
def generate(self) -> "Model":
"""
Generates mutants based on the input protein receptor.
@return (Model): the instance of Model, for chainability purposes.
"""
if self.verbose > 0:
print("Now generating new mutant protein receptors :")
# place it in eval mode
self.model.eval()
# creates the inference directory
n_runs = self._nruns()
run_dir = os.path.join(self.outputdir, f"run_{n_runs}")
os.makedirs(run_dir)
# no need to compute gradients during inference
with torch.no_grad():
for b, batch in enumerate(self.loader):
# move batch to selected device
batch = {k: v.to(self.device) if isinstance(v, torch.Tensor) else v for k, v in batch.items()}
# starts the inference for a single mutant
self.model.generate(
batch, target_path=os.path.join(run_dir, f"mutant_{b}")
)
# stores the original input files for comparison
os.makedirs(os.path.join(run_dir, "original"), exist_ok=True)
shutil.copyfile(self.sources[0], os.path.join(run_dir, "original", "orig_receptor.pdb"))
shutil.copyfile(self.sources[1], os.path.join(run_dir, "original", "orig_ligand.sdf"))
if self.verbose > 0:
print(f"\tinference done on a batch.")
return self
def results(self) -> "Model":
"""
write results in a summary file, along with all generated PDBs.
@return (Model): the instance of Model, for chainability purposes.
"""
if self.verbose > 0:
print(f"Now writing output files :")
for run in range(self._nruns()):
run_dir = os.path.join(self.outputdir, f"run_{run}")
# initialize the resulting summary TSV
summary = "ID\tdelta_G\tKd\tmutations (AA)\n"
# write original inputs docking in summary
src_mean_dg, src_mean_kd = self._dock(
os.path.join(run_dir, "original", f"orig_receptor.pdb"),
os.path.join(run_dir, "original", "orig_ligand.sdf")
)
summary += f"original\t{src_mean_dg}\t{src_mean_kd}\t0" + "\n"
for b in range(self._nbatches(run_dir)):
receptor_path = os.path.join(run_dir, f"mutant_{b}", f"{b}_whole.pdb")
ligand_path = os.path.join(run_dir, f"mutant_{b}", f"{b}.sdf")
mean_dg, mean_kd = self._dock(receptor_path, ligand_path)
# find the number of mutations (AA-level)
n_mutations = mutations(
os.path.join(run_dir, "original", f"orig_receptor.pdb"),
receptor_path
)
summary += f"mutant_{b}\t{mean_dg}\t{mean_kd}\t{n_mutations}" + "\n"
if self.verbose == 2:
print(f"\twrote one new entry in the summary file.")
# write summary to a local file
with open(os.path.join(run_dir, "summary.tsv"), "w") as file:
file.write(summary)
if self.verbose > 0:
print(f"You can find the run #{run} summary in your output folder.")
return self
def _dock(self, receptor_path, ligand_path):
# compute the docking window around ligand
docking_box = compute_box(receptor_path, ligand_path)
try:
energies = docking(
receptor_file=prepare(receptor_path),
ligand_file=prepare(ligand_path),
center=docking_box["center"],
box_size=docking_box["size"]
)
except Exception as e:
print(f"\t\terror simulating docking: {e}")
energies = np.zeros(1)
# calculates the mean Kd and deltaG
return np.mean(energies), np.mean([kd(e) for e in energies])
def _nruns(self) -> int:
"""
returns the number of inferences stored from now in the output directory
@return (int): the number of folders in dir
"""
os.makedirs(self.outputdir, exist_ok=True)
return len([f for f in os.listdir(self.outputdir) if os.path.isdir(os.path.join(self.outputdir, f))])
def _nbatches(self, run_path) -> int:
"""
returns the number of inferences stored from now in the output directory
@return (int): the number of folders in dir
"""
os.makedirs(run_path, exist_ok=True)
return len([f for f in os.listdir(run_path) if os.path.isdir(os.path.join(run_path, f))]) - 1
\ No newline at end of file
from PocketGen.utils.transforms import FeaturizeProteinAtom, FeaturizeLigandAtom
from torch_geometric.transforms import Compose
import torch
def densify(features:dict) -> torch.Tensor:
"""
Transforms a set of human-level features to a dense data tensor.
@param data (dict): a feature-dict returned by featurize()
@return (torch.Tensor): a dense-data torch tensor representing features.
"""
return Compose([
FeaturizeProteinAtom(),
FeaturizeLigandAtom(),
])(features)
def featurize(
protein_dict={},
ligand_dict={},
residue_dict={},
seq=None,
full_seq_index=None,
r10_index=None) -> dict:
"""
Transforms molecule interaction data into a feature
dict that is interpretable by the densify function.
@param protein_dict (dict): a dictionary representation of the receptor
@param ligand_dict (dict): a dictionary representation of the ligand
@param residue_dict (dict): a dictionary representation of the residue
@param seq (str): #################
@param full_seq_index (torch.Tensor): #################
@param r10_index (torch.Tensor): indexes of the residues (r < 10 around ligand)
@return (dict): a feature dictionnary
"""
# concatenates the first 3 dicts (prot, lig and residue)
features = dict({f"protein_{k}":v for k,v in protein_dict.items()},
**{f"ligand_{k}":v for k,v in ligand_dict.items()})
features.update(residue_dict)
# adds keys for simple variables
features.update({
'full_seq_idx': full_seq_index,
'r10_idx': r10_index,
'seq': seq
})
return features
\ No newline at end of file
import torch
from .featurize import densify, featurize
from PocketGen.utils.protein_ligand import PDBProtein, parse_sdf_file
from PocketGen.utils.data import torchify_dict
def interaction(receptor_path:str, ligand_path:str) -> torch.Tensor:
"""
Convert PDB and SDF files into a set of protein-ligand interaction features.
@param ligand_path (str): path to the ligand SDF file.
@param receptor_path (str): path to the receptor PDB file.
@return (torch.Tensor): a data-dense feature tensor representing the interaction.
"""
# read and parses the mol (pdb / sdf) files
pdb_block = open(receptor_path, 'r').read()
protein = PDBProtein(pdb_block)
ligand_dict = parse_sdf_file(ligand_path, feat=False)
# select only the residues inside a radius around the ligand
r10_index, r10_residues = protein.query_residues_ligand(ligand_dict, radius=10, selected_residue=None, return_mask=False)
full_seq_index, full_seq_residues = protein.query_residues_ligand(ligand_dict, radius=3.5, selected_residue=r10_residues, return_mask=False)
# defines pocket from the (r < 10) residues
pocket = PDBProtein(protein.residues_to_pdb_block(r10_residues))
pocket_dict = pocket.to_dict_atom()
residue_dict = pocket.to_dict_residue()
# defines the scope of protein_edit_residue (sould be of type torch.Tensor[bool])
_, residue_dict['protein_edit_residue'] = pocket.query_residues_ligand(ligand_dict)
full_seq_index.sort()
r10_index.sort()
# transforms data into features
data = featurize(
protein_dict=torchify_dict(pocket_dict),
ligand_dict=torchify_dict(ligand_dict),
residue_dict=torchify_dict(residue_dict),
seq=''.join(protein.to_dict_residue()['seq']),
full_seq_index=torch.tensor(full_seq_index),
r10_index=torch.tensor(r10_index)
)
# add metadata
data.update({
'protein_filename': receptor_path,
'ligand_filename': ligand_path,
'whole_protein_name': receptor_path
})
# return data-dense features tensor
return densify(data)
\ No newline at end of file