Example 6.1: Calculate ASO and AEIF descriptors¶
Prerequisites¶
In order to calculate the descriptor, it is crucial that we get an aligned conformer library (e. g. in ../00-libraries/box-aligned.clib)
Hardware Specification for Rerun¶
Desktop workstation with 2x (AMD EPYC 7702 64-Core) with total of 128 physical and 256 logical cores, 1024 GB DDR4 with Ubuntu 22.04 LTS operating system.
Step 1. Calculate the grid file¶
This step addresses several important features.
The grid is calculated based on the basic bounding box
The grid is then (optionally) pruned (using k-D tree algorithm), which means that all points reasonably far away from the molecule are not taken into account when computing the interaction field.
Lastly, the nearest atoms to all grid points are located, forming an “atom index indicator field”
[5]:
# This will display some help for the grid calculator
!molli grid --help
usage: molli grid [-h] [-o <fpath>] [-n NPROCS] [-p 0.0] [-s 1.0]
[-b BATCHSIZE] [--prune [<max_dist>:<eps>]]
[--nearest [NEAREST]] [--overwrite] [--dtype DTYPE]
library
Read a molli library and calculate a grid
positional arguments:
library Conformer library file to perform the calculations on
options:
-h, --help show this help message and exit
-o <fpath>, --output <fpath>
Destination for calculation results
-n NPROCS, --nprocs NPROCS
Specifies the number of jobs for constructing a grid
-p 0.0, --padding 0.0
The bounding box will be padded by this many angstroms
prior to grid construction
-s 1.0, --spacing 1.0
Intervals at which the grid points will be placed
-b BATCHSIZE, --batchsize BATCHSIZE
Number of molecules to be treated simulateneously
--prune [<max_dist>:<eps>]
Obtain the pruning indices for each conformer ensemble
--nearest [NEAREST] Obtain nearest atom indices for conformer ensembles.
This is necessary for indicator field descriptors.
Accepts up to 1 parameter which corresponds to the
cutoff distance.
--overwrite Overwrite the existing grid file
--dtype DTYPE Specify the data format to be used for grid parameter
storage.
[6]:
# Calculating the grid with 1.0 A step size, 32 parallel processes, 128 molecules/batch and with
# pruning and nearest atom locations
!molli grid -s 1.0 -n 32 -b 128 box_aligned.mlib --prune --nearest
Using output file: box_aligned_grid.hdf5
Successfully imported grid and bbox data from the previous calculation
Vectors: [ -9.027 -14.043 -10.131], [16.825 14.178 15.219]. Number of grid points: 19604. Volume: 18495.13 A**3.
Requested to calculate grid pruning with max_dist=2.000 eps=0.500
Skipping the pruning: all keys have been found already!
Nearest atoms:: 100%|█████████████████████████| 567/567 [01:53<00:00, 5.00it/s]
Note: The --prune option is necessary for the accelerated GBCA descriptor calculation, but unecessary for standard grid calculation
Step 2. Calculation of the descriptors¶
Having computed the grid, as well as having narrowed down the list of points to the most useful ones, we can finally compute the ASO and/or AEIF descriptor.
[7]:
!molli gbca --help
usage: molli gbca [-h] [-w] [-n 128] [-b 128] [-g <grid.hdf5>]
[-o <lib_aso.hdf5>] [--dtype DTYPE] [--overwrite]
{aso,aeif} CLIB_FILE
This module can be used for standalone computation of descriptors
positional arguments:
{aso,aeif} This selects the specific descriptor to compute.
CLIB_FILE Conformer library to perform the calculation on
options:
-h, --help show this help message and exit
-w, --weighted Apply the weights specified in the conformer files
-n 128, --nprocs 128 Selects number of processors for python
multiprocessing application. If the program is
launched via MPI backend, this parameter is ignored.
-b 128, --batchsize 128
Number of conformer ensembles to be processed in one
batch.
-g <grid.hdf5>, --grid <grid.hdf5>
File that contains the information about the
gridpoints.
-o <lib_aso.hdf5>, --output <lib_aso.hdf5>
File that contains the information about the
gridpoints.
--dtype DTYPE Specify the data format to be used for grid parameter
storage.
--overwrite Overwrite the existing descriptor file
[11]:
!molli gbca aso box_aligned.mlib -n 64 -b 128
To be computed: 72542 ensembles. Skipping 0
Computing descriptor ASO: 100%|███████████████| 567/567 [02:56<00:00, 3.21it/s]