{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Example 5.1 GBCA Benchmark\n", "\n", "The Average Steric Occupancy (ASO) descriptor was originally developed in the Denmark lab to capture the dynamic nature of sterics in a molecule. This measures whether an indiviudal conformer is occupying a grid point or not and assigns it a value of 0 if unoccupied and a value of 1 if occupied. This then averages the amount a grid-point was occupied over the number of conformers calculated, giving a value between 0 and 1. More information can be found at [**DOI**:10.1126/science.aau5631](https://www.science.org/doi/10.1126/science.aau5631)\n", "\n", "We have significantly accelerated this descriptor calculation such that it can operate on massive `ConformerLibraries`. This has also been made availble to parallelize for further acceleration. An example of an ASO calculation run through the command line is shown below\n", "\n", "## Hardware Specification for Rerun\n", "\n", "Desktop workstation with 2x (AMD EPYC 7702 64-Core) with total of 128 physical and 256 logical cores, 1024 GB DDR4 with Ubuntu 22.04 LTS operating system." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import molli as ml\n", "import numpy as np\n", "import h5py\n", "import timeit\n", "import molli_xt\n", "from scipy.spatial.distance import cdist\n", "\n", "\n", "# function definitions for benchmarking comparison\n", "def aso_naive(ens: ml.ConformerEnsemble, grid: np.ndarray):\n", " from math import dist\n", "\n", " aso = np.zeros(grid.shape[0])\n", " for i, gp in enumerate(grid):\n", " N = 0\n", " for conformer in ens:\n", " if any(\n", " dist(conformer.coords[j], gp) <= a.vdw_radius\n", " for j, a in enumerate(ens.atoms)\n", " ):\n", " N += 1\n", "\n", " aso[i] = N / ens.n_conformers\n", "\n", " return aso\n", "\n", "\n", "def aso_numpy(ens: ml.ConformerEnsemble, grid: np.ndarray):\n", " aso_full = np.empty((ens.n_conformers, grid.shape[0]))\n", "\n", " # Iterate over conformers in the ensemble\n", " # Complexity (O(n_confs * n_gpts))\n", " for i, c in enumerate(ens):\n", " # Iterate over atoms\n", " for j, a in enumerate(c.atoms):\n", " where = np.sum((grid - c.coords[j]) ** 2, axis=-1) <= a.vdw_radius**2\n", " aso_full[i, where] = 1\n", "\n", " return np.average(\n", " aso_full,\n", " axis=0,\n", " )\n", "\n", "\n", "def aso_scipy_cdist(ens: ml.ConformerEnsemble, grid: np.ndarray):\n", " aso_full = np.empty((ens.n_conformers, grid.shape[0]))\n", "\n", " # Iterate over conformers in the ensemble\n", " # Complexity (O(n_confs * n_gpts))\n", " vdwr = np.array([a.vdw_radius for a in ens.atoms])\n", " for i, c in enumerate(ens):\n", " alldist = cdist(grid, c.coords, \"euclidean\")\n", " where = np.any(alldist >= vdwr[np.newaxis, :], axis=-1)\n", " aso_full[i, where] = 1\n", "\n", " return np.average(\n", " aso_full,\n", " axis=0,\n", " )\n", "\n", "\n", "def aso_molli_cdist(ens: ml.ConformerEnsemble, grid: np.ndarray):\n", " alldist = molli_xt.cdist32f_eu2(ens._coords, grid)\n", " vdwr2s = np.array([a.vdw_radius for a in ens.atoms]) ** 2\n", " diff = alldist <= vdwr2s[:, None]\n", " np.average(np.any(diff, axis=1), axis=0)\n", "\n", "\n", "def aso_molli_kdtree_cdist(ens: ml.ConformerEnsemble, grid: np.ndarray):\n", " pruned = ml.descriptor.prune(grid, ens, max_dist=1.8)\n", " aso = np.zeros(grid.shape[0])\n", " aso[pruned] = aso_molli_cdist(ens, grid[pruned])\n", " return aso" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ConformerEnsemble(name='65_vi', formula='C44 H37 O6 P1', n_conformers=215), ens.n_atoms=88\n", "grid15: aso_naive : 202.560668\n", "grid15: aso_numpy : 5.981086\n", "grid15: aso_scipy_cdist : 0.770540\n", "grid15: aso_molli_cdist : 0.688863\n", "grid15: aso_molli_kdtree_cdist : 0.163643\n", "grid10: aso_naive : 656.652245\n", "grid10: aso_numpy : 18.337421\n", "grid10: aso_scipy_cdist : 2.529017\n", "grid10: aso_molli_cdist : 2.139469\n", "grid10: aso_molli_kdtree_cdist : 0.516551\n", "grid07: aso_naive : 1908.888225\n", "grid07: aso_numpy : 50.978774\n", "grid07: aso_scipy_cdist : 7.755869\n", "grid07: aso_molli_cdist : 5.844610\n", "grid07: aso_molli_kdtree_cdist : 1.472854\n" ] } ], "source": [ "if __name__ == \"__main__\":\n", " ens = ml.ConformerEnsemble.load_mol2(\"65_vi.mol2\")\n", " print(f\"{ens}, {ens.n_atoms=}\")\n", "\n", " calculators = (\n", " aso_naive,\n", " aso_numpy,\n", " aso_scipy_cdist,\n", " aso_molli_cdist,\n", " aso_molli_kdtree_cdist,\n", " )\n", "\n", " grids = {}\n", " for gn in (\"grid15\", \"grid10\", \"grid07\"):\n", " with h5py.File(f\"bpa_aligned_{gn}.hdf5\", \"r\") as f:\n", " grids[gn] = np.asarray(f[\"grid\"])\n", "\n", " for gn, grid in grids.items():\n", " for calc in calculators:\n", " if calc is aso_naive:\n", " times = timeit.Timer(\n", " \"\"\"calc(ens, grid)\"\"\",\n", " globals=globals(),\n", " ).repeat(1, 1)\n", " else:\n", " times = timeit.Timer(\n", " \"\"\"calc(ens, grid)\"\"\",\n", " globals=globals(),\n", " ).repeat(5, 3)\n", "\n", " print(f\"{gn}: {calc.__name__:<30}: {min(times):0.6f}\")\n" ] } ], "metadata": { "kernelspec": { "display_name": "dev-blake", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.6" } }, "nbformat": 4, "nbformat_minor": 2 }