Example 6.1: Calculate ASO and AEIF descriptors

Prerequisites

In order to calculate the descriptor, it is crucial that we get an aligned conformer library (e. g. in ../00-libraries/box-aligned.clib)

Hardware Specification for Rerun

Desktop workstation with 2x (AMD EPYC 7702 64-Core) with total of 128 physical and 256 logical cores, 1024 GB DDR4 with Ubuntu 22.04 LTS operating system.

Step 1. Calculate the grid file

This step addresses several important features.

  1. The grid is calculated based on the basic bounding box

  2. The grid is then (optionally) pruned (using k-D tree algorithm), which means that all points reasonably far away from the molecule are not taken into account when computing the interaction field.

  3. Lastly, the nearest atoms to all grid points are located, forming an “atom index indicator field”

[5]:
# This will display some help for the grid calculator
!molli grid --help
usage: molli grid [-h] [-o <fpath>] [-n NPROCS] [-p 0.0] [-s 1.0]
                  [-b BATCHSIZE] [--prune [<max_dist>:<eps>]]
                  [--nearest [NEAREST]] [--overwrite] [--dtype DTYPE]
                  library

Read a molli library and calculate a grid

positional arguments:
  library               Conformer library file to perform the calculations on

options:
  -h, --help            show this help message and exit
  -o <fpath>, --output <fpath>
                        Destination for calculation results
  -n NPROCS, --nprocs NPROCS
                        Specifies the number of jobs for constructing a grid
  -p 0.0, --padding 0.0
                        The bounding box will be padded by this many angstroms
                        prior to grid construction
  -s 1.0, --spacing 1.0
                        Intervals at which the grid points will be placed
  -b BATCHSIZE, --batchsize BATCHSIZE
                        Number of molecules to be treated simulateneously
  --prune [<max_dist>:<eps>]
                        Obtain the pruning indices for each conformer ensemble
  --nearest [NEAREST]   Obtain nearest atom indices for conformer ensembles.
                        This is necessary for indicator field descriptors.
                        Accepts up to 1 parameter which corresponds to the
                        cutoff distance.
  --overwrite           Overwrite the existing grid file
  --dtype DTYPE         Specify the data format to be used for grid parameter
                        storage.
[6]:
# Calculating the grid with 1.0 A step size, 32 parallel processes, 128 molecules/batch and with
# pruning and nearest atom locations

!molli grid -s 1.0 -n 32 -b 128 box_aligned.mlib --prune --nearest
Using output file: box_aligned_grid.hdf5
Successfully imported grid and bbox data from the previous calculation
Vectors: [ -9.027 -14.043 -10.131], [16.825 14.178 15.219]. Number of grid points: 19604. Volume: 18495.13 A**3.
Requested to calculate grid pruning with max_dist=2.000 eps=0.500
Skipping the pruning: all keys have been found already!
Nearest atoms:: 100%|█████████████████████████| 567/567 [01:53<00:00,  5.00it/s]

Note: The --prune option is necessary for the accelerated GBCA descriptor calculation, but unecessary for standard grid calculation

Step 2. Calculation of the descriptors

Having computed the grid, as well as having narrowed down the list of points to the most useful ones, we can finally compute the ASO and/or AEIF descriptor.

[7]:
!molli gbca --help
usage: molli gbca [-h] [-w] [-n 128] [-b 128] [-g <grid.hdf5>]
                  [-o <lib_aso.hdf5>] [--dtype DTYPE] [--overwrite]
                  {aso,aeif} CLIB_FILE

This module can be used for standalone computation of descriptors

positional arguments:
  {aso,aeif}            This selects the specific descriptor to compute.
  CLIB_FILE             Conformer library to perform the calculation on

options:
  -h, --help            show this help message and exit
  -w, --weighted        Apply the weights specified in the conformer files
  -n 128, --nprocs 128  Selects number of processors for python
                        multiprocessing application. If the program is
                        launched via MPI backend, this parameter is ignored.
  -b 128, --batchsize 128
                        Number of conformer ensembles to be processed in one
                        batch.
  -g <grid.hdf5>, --grid <grid.hdf5>
                        File that contains the information about the
                        gridpoints.
  -o <lib_aso.hdf5>, --output <lib_aso.hdf5>
                        File that contains the information about the
                        gridpoints.
  --dtype DTYPE         Specify the data format to be used for grid parameter
                        storage.
  --overwrite           Overwrite the existing descriptor file
[11]:
!molli gbca aso box_aligned.mlib -n 64 -b 128
To be computed: 72542 ensembles. Skipping 0
Computing descriptor ASO: 100%|███████████████| 567/567 [02:56<00:00,  3.21it/s]