Compare revisions

CallMeKitsu. · paradoxe-tech · paradoxe-tech · paradoxe-tech · paradoxe-tech · paradoxe-tech
--- a/.gitignore
+++ b/.gitignore
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+#.idea/
+
+.replit
+data/*
+results/*
+PocketGen/*
+checkpoints/*
+poetry.lock
+pyproject.toml
+replit.nix
+PocketGen
+pocketgen
+pocketgen/*
+
+# temporary
+benchmark.py
+old_model.py
\ No newline at end of file
--- a/README.md
+++ b/README.md
-# Team Evry-Paris-Saclay 2024 Software Tool
-
-If your team competes in the [**Software & AI** village](https://competition.igem.org/participation/villages) or wants to
-apply for the [**Best Software Tool** prize](https://competition.igem.org/judging/awards), you **MUST** host all the
-code of your team's software tool in this repository, `main` branch. By the **Wiki Freeze**, a
-[release](https://docs.gitlab.com/ee/user/project/releases/) will be automatically created as the judging artifact of
-this software tool. You will be able to keep working on your software after the Grand Jamboree.
-
-> If your team does not have any software tool, you can totally ignore this repository. If left unchanged, this
-repository will be automatically deleted by the end of the season.
-
-
-
-## Description
-Let people know what your project can do specifically. Provide context and add a link to any reference visitors might
-be unfamiliar with (for example your team wiki). A list of Features or a Background subsection can also be added here.
-If there are alternatives to your project, this is a good place to list differentiating factors.
-
-## Installation
-Within a particular ecosystem, there may be a common way of installing things, such as using Yarn, NuGet, or Homebrew.
-However, consider the possibility that whoever is reading your README is a novice and would like more guidance. Listing
-specific steps helps remove ambiguity and gets people to using your project as quickly as possible. If it only runs in a
-specific context like a particular programming language version or operating system or has dependencies that have to be
-installed manually, also add a Requirements subsection.
-
-## Usage
-Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of
-usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably
-include in the README.
-
-## Contributing
-State if you are open to contributions and what your requirements are for accepting them.
-
-For people who want to make changes to your project, it's helpful to have some documentation on how to get started.
-Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps
-explicit. These instructions could also be useful to your future self.
-
-You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce
-the likelihood that the changes inadvertently break something. Having instructions for running tests is especially
-helpful if it requires external setup, such as starting a Selenium server for testing in a browser.
-
-## Authors and acknowledgment
-Show your appreciation to those who have contributed to the project.
+# 🚀 Flint : AI-powered protein receptor mutation
+
+A computational tool designed to generate protein receptor mutants that allows enhanced binding specificity towards a target ligand, based on [PocketGen]. While generating a set of mutated receptor structures, Flint evaluates their affinity with the ligand using [AutoDock Vina]. By default, our custom `__MODELNAME__` pre-trained model checkpoint is used for the generation. The key point is that the docking simulation was embedded as the scoring function during the learning transfer, making it the target of the gradient descent (see [Model](https://2024.igem.wiki/evry-paris-saclay/model) article on the team's wiki).
+
+## Outputs and expected results
+
+If executed correctly, Flint will generate a set of unique mutated receptor proteins in PDB format, designed to maximize the binding affinity for the ligand ; and a summary file containing, for each receptor :
+   - The corresponding docking score, affinity constant and rank compared to other mutants.
+   - Additional information about the receptor-ligand interaction (e.g. ligand position, residues involved).
+   - The sequence of mutations that leads to its creation, starting from the original receptor.
+
+## Getting started with Flint
+
+```bash
+git clone https://github.com/Phagevo/Flint.git
+cd Flint
+git clone https://github.com/Phagevo/PocketGen.git
+```
+Install the environment and dependencies using [conda]'s config file
+```bash
+conda env create -f env.yaml
+conda activate flint
+```
+If you intend to build environment without conda, keep in mind that installing [AutoDock Vina] from `pip` or any other package manager is deprecated. Besides, to run the project from Windows, [this question](https://stackoverflow.com/questions/71865073/unable-to-install-autodock-vina-potentially-due-to-boost) on stackoverflow might be helpful.
+```yaml
+├── pocketgen # can be cloned from PocketGen repository
+├── checkpoints # folder needs to be created manually
+│   ├── __MODELNAME__.pt
+│   └── pocketgen.pt 
+│
+├── eval
+├── model
+└── main.py
+```
+This (above) is what should ressemble your working directory after installing Flint.
+
+## Usage from command line
+```bash
+python main.py --receptor <receptor.pdb> --ligand <ligand.sdf> --output <output_directory>
+```
+- `<receptor.pdb>`: Path to the input protein receptor file in PDB format.
+- `<ligand.sdf>`: Path to the input ligand file in SDF format.
+- `<output_directory>`: Directory where the output mutant structures and scores will be saved.
+
+[AutoDock Vina]: https://github.com/ccsb-scripps/AutoDock-Vina
+[PocketGen]: https://github.com/zaixizhang/PocketGen
\ No newline at end of file
--- a/env.yaml
+++ b/env.yaml
+name: flint
+channels:
+  - defaults
+  - conda-forge
+  - pytorch
+  - pyg
+dependencies:
+  - python=3.11
+  - pytorch
+  - vina
+  - fair-esm
+  - rdkit
+  - openmm
+  - pdbfixer
+  - openbabel
+  - meeko
+  - easydict
+  - lmdb
+  - python-lmdb
+  - pytorch_geometric
+  - pyyaml
+  - omegaconf
+  - biopython
+  - pip
+  - pytorch_scatter
+  - pytorch_cluster
+  - pip:
+    - git+https://github.com/Valdes-Tresanco-MS/AutoDockTools_py3
+prefix: /home/paradoxe/anaconda3/envs/phagevo
--- a/eval/chemutils.py
+++ b/eval/chemutils.py
+import math
+
+def kd(delta_G:float, temperature:float=298.0) -> float:
+  """
+  calculates the affinity constant depending on the binding free energy
+  @param delta_G (float): the value of the binding free energy (kcal/mol)
+  @param temperature (float): temperature (kelvin)
+  @return (float): affinity constant
+  """
+  
+  R = 0.001987 # gaz constant
+  return math.exp((-delta_G) / (R * temperature))
\ No newline at end of file
--- a/eval/docking.py
+++ b/eval/docking.py
+import os
+from vina import vina
+import time
+
+def docking(
+  receptor_file:str, 
+  ligand_file:str, 
+  center:"tuple[float, float, float]"=(0,0,0), 
+  box_size:"tuple[float, float, float]"=(20,20,20),
+  n_dockings:int=64, 
+  n_poses:int=32,
+  score=False,
+  write=False) -> "list[float] | float":
+
+  """
+  Docking simulation function : returns ...
+  @param receptor_file: protein (pdbqt file)
+  @param ligand_file: ligand (pdbqt file)
+  @param center: docking window center
+  @param box_size: docking window size
+  @param n_dockings: number of docking simulations
+  @param n_poses: number of pose attempts per simulation
+  @param score (bool): wether or not the output should be a single score
+  @param write (bool): wether or not the poses should be saved to a file
+  @return (list[float] | float): a list of scores or a single score
+  """
+
+  receptor_name = os.path.splitext(receptor_file)[-1].split('.')[0]
+  ligand_name = os.path.splitext(ligand_file)[-1].split('.')[0]
+
+  # initialises vina
+  v = Vina(sf_name='vina', verbosity=1)
+  v.set_receptor(receptor_file)
+  v.set_ligand_from_file(ligand_file)
+
+  # set the docking frame
+  v.compute_vina_maps(center=center,box_size=box_size)
+
+  # scores the current pose
+  energy = v.score()
+  print('Score before minimization: %.3f (kcal/mol)' % energy[0])
+
+  # minimizes locally the current pose
+  energy_minimized = v.optimize()
+  print('Score after minimization : %.3f (kcal/mol)' % energy_minimized[0])
+  # v.write_pose(f'{ligand_name}_minimized.pdbqt', overwrite=True)
+
+  # docks the ligand
+  v.dock(exhaustiveness=n_dockings, n_poses=20)
+
+  if write:
+    v.write_poses(
+      f'results/docked/{ligand_name}_docked_{time.time()}.pdbqt', 
+      n_poses=n_poses)
+
+  output = None
+
+  # single hi-score or the list of all poses energies
+  if not score:
+    results = v.energies(n_poses=n_poses)
+    output = [energies[0] for energies in results]
+  else:
+    output = v.energies(n_poses=1)[0][0]
+
+  return output
--- a/eval/mutations.py
+++ b/eval/mutations.py
+from Bio import PDB
+
+def get_sequence(structure):
+  """
+  Loads a protein structure and returns the sequence of amino acids.
+  @param structure (str): the protein structure.
+  @return (sequence): the aa sequence of the protein.
+  """
+  sequence = []
+  for model in structure:
+    for chain in model:
+      for residue in chain:
+        if PDB.is_aa(residue):
+          sequence.append(residue.resname)
+  return sequence
+
+
+def mutations(protein1_path, protein2_path):
+  """
+  Loads two protein paths and returns the number of mutations between them.
+  @param protein1_path (str): the first protein path.
+  @param protein2_path (str): the second protein path.
+  @return (int): the number of mutations between the two proteins.
+  """
+  parser = PDB.PDBParser()
+  structure1 = parser.get_structure('protein1', protein1_path)
+  structure2 = parser.get_structure('protein2', protein2_path)
+  seq1 = get_sequence(structure1)
+  seq2 = get_sequence(structure2)
+  mutations = []
+  for i, (res1, res2) in enumerate(zip(seq1, seq2)):
+    if res1 != res2:
+      mutations.append((i + 1, res1, res2))
+  return len(mutations)
--- a/eval/prepare.py
+++ b/eval/prepare.py
+import os
+import subprocess
+
+def prepare(file_path:str) -> str:
+    """
+    Convert a PDB file to PDBQT format using Open Babel.
+    @param file_path: path to the input PDB file
+    @return: path to the output PDBQT file
+    """
+
+    # defines the output file name
+    pdbqt_file = os.path.splitext(file_path)[0] + '.pdbqt'
+    pdbqt_file = pdbqt_file.replace('raw', 'preped')
+    
+    # converts PDB to PDBQT using Open Babel
+    flags = "-xc -xr" if file_path.endswith("pdb") else ""
+    command = f'obabel {file_path} -opdbqt -O {pdbqt_file} -h {flags}'
+    command += "--partialcharge gasteiger" # includes forces and charges
+    subprocess.run(command, shell=True)
+
+    return pdbqt_file
--- a/eval/window.py
+++ b/eval/window.py
+from rdkit import Chem
+from Bio.PDB import PDBParser
+import numpy as np
+
+def compute_box(
+    receptor_path:str, 
+    ligand_path:str, 
+    cutoff:float=5.0, 
+    padding:float=5.0) -> "dict[str, tuple[float, float, float]]":
+    
+    """
+    calculates the dimensions and center of the docking box
+    @param receptor_path: path to the receptor file (.pdb)
+    @param ligand_path: path to the ligand file (.sdf)
+    @param cutoff: capture distance for neighbour atoms (angstrom)
+    @param padding: padding around the box to ensure the ligand is inside (angstrom)
+    @return: center coordinates (x, y, z) and sizes (x, y, z) of the box
+    """
+    
+    ligand = Chem.SDMolSupplier(ligand_path)[0]
+    ligand_coords = np.array([list(ligand.GetConformer().GetAtomPosition(i)) 
+        for i in range(ligand.GetNumAtoms())])
+    structure = PDBParser(QUIET=True).get_structure('receptor', receptor_path)
+    atoms = list(structure.get_atoms()) # get all atoms in receptor
+
+    # compute the geometric center of the ligand (center of mass)
+    ligand_center = np.mean(ligand_coords, axis=0).astype(float)
+    
+    # collect atoms close to the ligand
+    site_atoms = np.array([atom.coord 
+        for atom in atoms 
+        if np.linalg.norm(atom.coord - ligand_center) <= cutoff]).astype(float)
+    if site_atoms.size == 0:
+        site_atoms = ligand_coords
+
+    # compute min/max coordinates for the docking box
+    x_min, y_min, z_min = np.min(site_atoms, axis=0)
+    x_max, y_max, z_max = np.max(site_atoms, axis=0)
+    
+    return {
+        "center": (
+            (x_min + x_max) / 2, 
+            (y_min + y_max) / 2, 
+            (z_min + z_max) / 2
+        ),
+        "size": (
+            (x_max - x_min) + 2 * padding, 
+            (y_max - y_min) + 2 * padding, 
+            (z_max - z_min) + 2 * padding
+        )
+    }
\ No newline at end of file
--- a/main.py
+++ b/main.py
+from model.Model import Model
+import argparse
+import torch
+
+# if called from command line
+if __name__ == "__main__":
+  torch.set_warn_always(False)
+  parser = argparse.ArgumentParser()
+
+  parser.add_argument("-d", "--device", type=str, default="cuda:0", help="Set the device (cpu or cuda:0)")
+  parser.add_argument("-o", "--output", type=str, default="./results", help="Set the path for the output directory")
+  parser.add_argument("-v", "--verbose", type=int, choices=[0, 1, 2], default=1, help="Set the verbosity between 0 and 2")
+  parser.add_argument("--receptor", type=str, required=True, help="Set the receptor filepath")
+  parser.add_argument("--ligand", type=str, required=True, help="Set the ligand filepath")
+  parser.add_argument("-n", "--number", type=int, default=8, help="Chose the number of generated mutants")
+
+
+  # parse arguments
+  args = parser.parse_args()
+  
+  # instantiates the model with args
+  flint = Model("./checkpoints/checkpoint.pt", {
+    "device": args.device,
+    "output": args.output,
+    "verbose": args.verbose,
+    "number": args.number
+  })
+  
+  # pass molecule files to the model 
+  flint.input(args.receptor, args.ligand)
+
+  # begin the inference / generate mutants
+  # flint.generate()
+
+  # output the results and write the summary file
+  flint.results()
\ No newline at end of file
--- a/model/Model.py
+++ b/model/Model.py
+import esm
+import torch
+import os 
+import shutil
+from torch.utils.data import DataLoader
+from functools import partial
+import numpy as np
+
+from PocketGen.models.PD import Pocket_Design_new
+from PocketGen.utils.misc import seed_all, load_config
+from PocketGen.utils.transforms import FeaturizeProteinAtom, FeaturizeLigandAtom
+from PocketGen.utils.data import collate_mols_block
+
+from .sampler import interaction
+from eval.docking import docking
+from eval.prepare import prepare
+from eval.window import compute_box
+from eval.mutations import mutations
+from eval.chemutils import kd
+
+class Model:
+  def __init__(self, checkpoint_path:str, args):
+    """
+    The mutant generation model constructor. This method does the setup of 
+    torch and CUDA environment, loads the checkpoint and then returns a PocketGen 
+    instance using the weights from checkpoints and the parameters retrieved.
+    @param checkpoint_path (str): Path to checkpoint (.pt) file for PocketGen.
+    @param verbose (int): 0 for quiet, 1 for necessary information and 2 for debug.
+    """
+
+    # setup global class variables
+    self.verbose = args["verbose"]
+    self.device = args["device"]
+    self.outputdir = args["output"]
+    self.size = args["number"]
+    self.sources = []
+    self.config = load_config('./PocketGen/configs/train_model.yml')
+    
+    if self.verbose > 0:
+      print('Flint setup started, please wait.')
+    if self.verbose == 2:
+      print('Now initializing pytorch and CUDA environment :')
+
+    # cleans cache and sets the libs seeds
+    torch.cuda.empty_cache()
+    seed_all(2089)
+
+    if self.verbose == 2:
+      print('\tpytorch and CUDA initialized correctly.')
+      print('Now retrieving alphabet from fair-ESM :')
+
+    # sets ESM2 alphabet as the usual alphabet
+    pretrained_model, self.alphabet = esm.pretrained.load_model_and_alphabet_hub('esm2_t33_650M_UR50D')
+    del pretrained_model # ESM2 pretrained_model that we don't need here is deleted from memory
+
+    if self.verbose == 2:
+      print('\tESM alphabet successfully loaded.')
+      print('Now building PocketGen model :')
+
+    # get the model checkpoint from .pt file
+    self.checkpoint = torch.load(checkpoint_path, map_location=self.device)
+
+    if self.verbose == 2:
+      print('\tcheckpoint successfully created.')
+
+    # instanciate PocketGen model for pocket design
+    self.model = Pocket_Design_new(
+      self.config.model,
+      protein_atom_feature_dim=FeaturizeProteinAtom().feature_dim,
+      ligand_atom_feature_dim=FeaturizeLigandAtom().feature_dim,
+      device=self.device
+    )
+
+    if self.verbose == 2:
+      print("\tPocketGen model well instanciated.")
+
+    # send model to selected device
+    self.model = self.model.to(self.device)
+
+    if self.verbose == 2:
+      print('\tPocketGen model sent to selected device.')
+
+    # load current saved checkpoint into model
+    self.model.load_state_dict(self.checkpoint['model'])
+
+    if self.verbose == 2:
+      print('\tcheckpoint loaded into PocketGen.')
+      print('End of setup, model can now be used.\n\n')
+  
+
+  def input(self, receptor_path:str, ligand_path:str) -> "Model":
+    """
+    Loads a protein receptor and a ligand from files and store it in 
+    a data-loader, useable by the model when generating mutants.
+    @param ligand_path (str): path to the ligand SDF file.
+    @param receptor_path (str): path to the receptor PDB file.
+    @return (Model): the instance of Model, for chainability purposes.
+    """
+
+    if self.verbose == 2:
+      print('Now parsing data from receptor and ligand :')
+    
+    # get dense features from receptor-ligand interaction
+    features = interaction(receptor_path, ligand_path)
+
+    if self.verbose == 2:
+      print('\tsuccessfully parsed interaction features.\n')
+      print('Now building the pytorch dataloader :')
+
+    # initialize the data loader (including batch converter)
+    self.loader = DataLoader(
+      [features for _ in range(self.size)],
+      batch_size=1, 
+      shuffle=False,
+      num_workers=self.config.train.num_workers,
+      collate_fn=partial(
+        collate_mols_block, 
+        batch_converter=self.alphabet.get_batch_converter()
+      )
+    )
+
+    # stores the source input files to compare
+    self.sources = [receptor_path, ligand_path]
+
+    if self.verbose == 2:
+      print('\tpytorch dataloader built correctly.')
+
+    return self
+
+  
+  def generate(self) -> "Model":
+    """
+    Generates mutants based on the input protein receptor.
+    @return (Model): the instance of Model, for chainability purposes.
+    """
+
+    if self.verbose > 0:
+      print("Now generating new mutant protein receptors :")
+
+    # place it in eval mode
+    self.model.eval()
+
+    # creates the inference directory
+    n_runs = self._nruns()
+    run_dir = os.path.join(self.outputdir, f"run_{n_runs}")
+    os.makedirs(run_dir)
+
+    # no need to compute gradients during inference
+    with torch.no_grad():
+      for b, batch in enumerate(self.loader):
+        # move batch to selected device
+        batch = {k: v.to(self.device) if isinstance(v, torch.Tensor) else v for k, v in batch.items()}
+
+        # starts the inference for a single mutant
+        self.model.generate(
+          batch, target_path=os.path.join(run_dir, f"mutant_{b}")
+        )
+        
+        # stores the original input files for comparison
+        os.makedirs(os.path.join(run_dir, "original"), exist_ok=True)
+        shutil.copyfile(self.sources[0], os.path.join(run_dir, "original", "orig_receptor.pdb"))
+        shutil.copyfile(self.sources[1], os.path.join(run_dir, "original", "orig_ligand.sdf"))
+        
+        if self.verbose > 0:
+          print(f"\tinference done on a batch.")
+
+    return self
+  
+  
+  def results(self) -> "Model":
+    """
+    write results in a summary file, along with all generated PDBs.
+    @return (Model): the instance of Model, for chainability purposes.
+    """
+
+    if self.verbose > 0:
+      print(f"Now writing output files :")
+    
+    for run in range(self._nruns()):
+      run_dir = os.path.join(self.outputdir, f"run_{run}")
+
+      # initialize the resulting summary TSV
+      summary = "ID\tdelta_G\tKd\tmutations (AA)\n"
+
+      # write original inputs docking in summary
+      src_mean_dg, src_mean_kd = self._dock(
+        os.path.join(run_dir, "original", f"orig_receptor.pdb"),
+        os.path.join(run_dir, "original", "orig_ligand.sdf")
+      )
+
+      summary += f"original\t{src_mean_dg}\t{src_mean_kd}\t0" + "\n"
+
+      for b in range(self._nbatches(run_dir)):
+        receptor_path = os.path.join(run_dir, f"mutant_{b}", f"{b}_whole.pdb")
+        ligand_path = os.path.join(run_dir, f"mutant_{b}", f"{b}.sdf")
+
+        mean_dg, mean_kd = self._dock(receptor_path, ligand_path)
+
+        # find the number of mutations (AA-level)
+        n_mutations = mutations(
+          os.path.join(run_dir, "original", f"orig_receptor.pdb"),
+          receptor_path
+        )
+
+        summary += f"mutant_{b}\t{mean_dg}\t{mean_kd}\t{n_mutations}" + "\n"
+
+        if self.verbose == 2:
+          print(f"\twrote one new entry in the summary file.")
+      
+      # write summary to a local file
+      with open(os.path.join(run_dir, "summary.tsv"), "w") as file:
+        file.write(summary)
+
+      if self.verbose > 0:
+        print(f"You can find the run #{run} summary in your output folder.")
+
+    return self
+  
+
+  def _dock(self, receptor_path, ligand_path):
+
+    # compute the docking window around ligand
+    docking_box = compute_box(receptor_path, ligand_path)
+
+    try:
+      energies = docking(
+        receptor_file=prepare(receptor_path),
+        ligand_file=prepare(ligand_path),
+        center=docking_box["center"],
+        box_size=docking_box["size"]
+      )
+    except Exception as e:
+      print(f"\t\terror simulating docking: {e}")
+      energies = np.zeros(1)
+
+    # calculates the mean Kd and deltaG
+    return np.mean(energies), np.mean([kd(e) for e in energies])
+
+
+  def _nruns(self) -> int:
+    """
+    returns the number of inferences stored from now in the output directory
+    @return (int): the number of folders in dir
+    """
+
+    os.makedirs(self.outputdir, exist_ok=True)
+    return len([f for f in os.listdir(self.outputdir) if os.path.isdir(os.path.join(self.outputdir, f))])
+
+  def _nbatches(self, run_path) -> int:
+    """
+    returns the number of inferences stored from now in the output directory
+    @return (int): the number of folders in dir
+    """
+
+    os.makedirs(run_path, exist_ok=True)
+    return len([f for f in os.listdir(run_path) if os.path.isdir(os.path.join(run_path, f))]) - 1
\ No newline at end of file
--- a/model/featurize.py
+++ b/model/featurize.py
+from PocketGen.utils.transforms import FeaturizeProteinAtom, FeaturizeLigandAtom
+from torch_geometric.transforms import Compose
+import torch
+
+def densify(features:dict) -> torch.Tensor:
+  """
+  Transforms a set of human-level features to a dense data tensor.
+  @param data (dict): a feature-dict returned by featurize()
+  @return (torch.Tensor): a dense-data torch tensor representing features.
+  """
+
+  return Compose([
+    FeaturizeProteinAtom(),
+    FeaturizeLigandAtom(),
+  ])(features)
+
+
+def featurize(
+  protein_dict={}, 
+  ligand_dict={}, 
+  residue_dict={}, 
+  seq=None, 
+  full_seq_index=None,
+  r10_index=None) -> dict:
+
+  """
+  Transforms molecule interaction data into a feature 
+  dict that is interpretable by the densify function.
+  @param protein_dict (dict): a dictionary representation of the receptor
+  @param ligand_dict (dict): a dictionary representation of the ligand
+  @param residue_dict (dict): a dictionary representation of the residue
+  @param seq (str): #################
+  @param full_seq_index (torch.Tensor): #################
+  @param r10_index (torch.Tensor): indexes of the residues (r < 10 around ligand)
+  @return (dict): a feature dictionnary
+  """
+
+  # concatenates the first 3 dicts (prot, lig and residue)
+  features = dict({f"protein_{k}":v for k,v in protein_dict.items()}, 
+    **{f"ligand_{k}":v for k,v in ligand_dict.items()})
+  features.update(residue_dict)
+
+  # adds keys for simple variables
+  features.update({
+    'full_seq_idx': full_seq_index,
+    'r10_idx': r10_index,
+    'seq': seq
+  })
+
+  return features
\ No newline at end of file
--- a/model/sampler.py
+++ b/model/sampler.py
+import torch
+from .featurize import densify, featurize
+from PocketGen.utils.protein_ligand import PDBProtein, parse_sdf_file
+from PocketGen.utils.data import torchify_dict
+
+def interaction(receptor_path:str, ligand_path:str) -> torch.Tensor:
+  """
+  Convert PDB and SDF files into a set of protein-ligand interaction features.
+  @param ligand_path (str): path to the ligand SDF file.
+  @param receptor_path (str): path to the receptor PDB file.
+  @return (torch.Tensor): a data-dense feature tensor representing the interaction.
+  """
+
+  # read and parses the mol (pdb / sdf) files
+  pdb_block = open(receptor_path, 'r').read()
+  protein = PDBProtein(pdb_block)
+  ligand_dict = parse_sdf_file(ligand_path, feat=False)
+
+  # select only the residues inside a radius around the ligand
+  r10_index, r10_residues = protein.query_residues_ligand(ligand_dict, radius=10, selected_residue=None, return_mask=False)
+  full_seq_index, full_seq_residues = protein.query_residues_ligand(ligand_dict, radius=3.5, selected_residue=r10_residues, return_mask=False)
+
+  # defines pocket from the (r < 10) residues
+  pocket = PDBProtein(protein.residues_to_pdb_block(r10_residues))
+  pocket_dict = pocket.to_dict_atom()
+  residue_dict = pocket.to_dict_residue()
+
+  # defines the scope of protein_edit_residue (sould be of type torch.Tensor[bool])
+  _, residue_dict['protein_edit_residue'] = pocket.query_residues_ligand(ligand_dict)
+
+  full_seq_index.sort()
+  r10_index.sort()
+
+  # transforms data into features
+  data = featurize(
+    protein_dict=torchify_dict(pocket_dict),
+    ligand_dict=torchify_dict(ligand_dict),
+    residue_dict=torchify_dict(residue_dict),
+    seq=''.join(protein.to_dict_residue()['seq']),
+    full_seq_index=torch.tensor(full_seq_index),
+    r10_index=torch.tensor(r10_index)
+  )
+
+  # add metadata
+  data.update({
+    'protein_filename': receptor_path,
+    'ligand_filename': ligand_path,
+    'whole_protein_name': receptor_path 
+  })
+
+  # return data-dense features tensor
+  return densify(data)
\ No newline at end of file
No results found