molli Command Line

About this tutorial

This file is meant to illustrate a few fundamental principles of the new molli package. The difference between old and new style molli is stark, therefore this introductory tutorial will be useful for both experienced people and newcomers.

Basic structure of molli package.

Subpackages

[16]:
# This is meant to be as iconic as `import numpy as np` :)
import molli as ml

Command line

molli features a number of standalone scripts for standard procedures, such as parsing a .CDXML file, or for compiling a collection.

[17]:
# This is a shell command
!molli --HELP
usage: molli [-C <file.yml>] [-L <file.log>] [-v] [-H] [-V]
             {list,align,combine,compile,gbca,grid,info,ls,parse,recollect,run,show,stats,test}

MOLLI package is an API that intends to create a concise and easy-to-use
syntax that encompasses the needs of cheminformatics (especially so, but not
limited to the workflows developed and used in the Denmark laboratory.

positional arguments:
  {list,align,combine,compile,gbca,grid,info,ls,parse,recollect,run,show,stats,test}
                        This is main command that invokes a specific
                        standalone routine in MOLLI. To get full explanation
                        of available commands, run `molli list`

options:
  -C <file.yml>, --CONFIG <file.yml>
                        Sets the file from which molli configuration will be
                        read from
  -L <file.log>, --LOG <file.log>
                        Sets the file that will contain the output of molli
                        routines.
  -v, --verbose         Sets the level of verbosity for molli output.
  -H, --HELP            show help message and exit
  -V, --VERSION         show program's version number and exit
[18]:
# This is a shell command
!molli list
molli align
molli combine
molli compile
molli gbca
molli grid
molli info
molli ls
molli parse
molli recollect
molli show
molli stats
molli test

Align

align allows for alignment of molecule libraries or conformer libraries based on a “Query” mol2 file. This can be a minimum substructure that exists within a library. Note: This requires the rmsd and pandas packages, which are currently not dependencies of molli. These can be added via pip install rmsd and pip install pandas OR conda install rmsd and conda install pandas respectively.

[19]:
!molli align -h
usage: molli align [-h] -i INPUT -q query_mol.mol2 [--rmsd {rmsd,scipy}]
                   [-o <aligned>] [-s STATS]

Read a conformer library and align it across given query

options:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        ConformerLibrary/MoleculeLibrary file to align
  -q query_mol.mol2, --query query_mol.mol2
                        Mol2 file with the reference query structure
  --rmsd {rmsd,scipy}   Method of rmsd calculation. Available are the default
                        and scipy
  -o <aligned>, --output <aligned>
                        Output file path and name w/o extension
  -s STATS, --stats STATS
                        True/False flag to save alignment statistics in the
                        separate file. Defaults to False.

Combine

combine allows combinatorial expansion of a library. One can view the core as being a full of base structures with certain attachment points. The substituents can be appended at different attachemnt points and with different methods depending on the values chosen.

[20]:
!molli combine -h
usage: molli combine [-h] -s <substituents.mlib>
                     [-m {same,permutns,combns,combns_repl}]
                     [-a ATTACHMENT_POINTS] [-n 1] [-b 1] -o <combined.mlib>
                     [-sep SEPARATOR] [--hadd]
                     [--obopt [ff maxiter tol disp ...]] [--overwrite]
                     cores

Combines two lists of molecules together

positional arguments:
  cores                 Base library file to combine wth substituents

options:
  -h, --help            show this help message and exit
  -s <substituents.mlib>, --substituents <substituents.mlib>
                        Substituents to add at each attachment of a core file
  -m {same,permutns,combns,combns_repl}, --mode {same,permutns,combns,combns_repl}
                        Method for combining substituents
  -a ATTACHMENT_POINTS, --attachment_points ATTACHMENT_POINTS
                        Label used to find attachment points
  -n 1, --nprocs 1      Number of processes to be used in parallel
  -b 1, --batchsize 1   Number of molecules to be processed at a time on a
                        single core
  -o <combined.mlib>, --output <combined.mlib>
                        File to be written to
  -sep SEPARATOR, --separator SEPARATOR
                        Name separator
  --hadd                Add implicit hydrogen atoms wherever possible.
  --obopt [ff maxiter tol disp ...]
                        Perform openbabel optimization on the fly. This
                        accepts up to 4 arguments. Arg 1: the forcefield
                        (uff/mmff94/gaff/ghemical). Arg 2: is the max number
                        of steps (default=500). Arg 3: energy convergence
                        criterion (default=1e-4) Arg 4: geometry displacement
                        (default=False) but values ~0.01-0.1 can help escape
                        planarity.
  --overwrite           Overwrite the target files if they exist (default is
                        false)

Compile

compile allows multiple libraries to be combined into one.

[21]:
!molli compile -h
usage: molli compile [-h] -o LIB_FILE [-t {molecule,ensemble}]
                     [-p {openbabel,obabel,molli}] [--stem] [-s] [-v]
                     [--overwrite]
                     [<file_or_glob> ...]

Compile matching files into a molli collection. Both conformer libraries and
molecule libraries are supported.

positional arguments:
  <file_or_glob>        List of source files or a glob pattern.

options:
  -h, --help            show this help message and exit
  -o LIB_FILE, --output LIB_FILE
                        New style collection to be made
  -t {molecule,ensemble}, --type {molecule,ensemble}
                        Type of object to be imported
  -p {openbabel,obabel,molli}, --parser {openbabel,obabel,molli}
                        Parser to be used to import the molecule object
  --stem                Renames the conformer ensemble to match the file stem
  -s, --split           This is only compatible with the choice of type
                        `molecule`. In this case all files are treated as
                        multi-molecule files
  -v, --verbose         Increase the amount of output
  --overwrite           Overwrite the destination collection

GBCA

gbca allows calculation of some of the grid-based descriptors. A more in-depth description of the command and its applications can be found in the cookbook.

[22]:
!molli gbca -h
usage: molli gbca [-h] [-w] [-n 128] [-b 128] [-g <grid.hdf5>]
                  [-o <lib_aso.hdf5>] [--dtype DTYPE] [--overwrite]
                  {aso,aeif} CLIB_FILE

This module can be used for standalone computation of descriptors

positional arguments:
  {aso,aeif}            This selects the specific descriptor to compute.
  CLIB_FILE             Conformer library to perform the calculation on

options:
  -h, --help            show this help message and exit
  -w, --weighted        Apply the weights specified in the conformer files
  -n 128, --nprocs 128  Selects number of processors for python
                        multiprocessing application. If the program is
                        launched via MPI backend, this parameter is ignored.
  -b 128, --batchsize 128
                        Number of conformer ensembles to be processed in one
                        batch.
  -g <grid.hdf5>, --grid <grid.hdf5>
                        File that contains the information about the
                        gridpoints.
  -o <lib_aso.hdf5>, --output <lib_aso.hdf5>
                        File that contains the information about the
                        gridpoints.
  --dtype DTYPE         Specify the data format to be used for grid parameter
                        storage.
  --overwrite           Overwrite the existing descriptor file

Grid

grid allows rectangular grid calculation of an existing molecule or conformer library with a variety of parameters. This is expanded upon in the cookbook.

[23]:
!molli grid -h
usage: molli grid [-h] [-o <fpath>] [-n NPROCS] [-p 0.0] [-s 1.0]
                  [-b BATCHSIZE] [--prune [<max_dist>:<eps>]]
                  [--nearest [NEAREST]] [--overwrite] [--dtype DTYPE]
                  library

Read a molli library and calculate a grid

positional arguments:
  library               Conformer library file to perform the calculations on

options:
  -h, --help            show this help message and exit
  -o <fpath>, --output <fpath>
                        Destination for calculation results
  -n NPROCS, --nprocs NPROCS
                        Specifies the number of jobs for constructing a grid
  -p 0.0, --padding 0.0
                        The bounding box will be padded by this many angstroms
                        prior to grid construction
  -s 1.0, --spacing 1.0
                        Intervals at which the grid points will be placed
  -b BATCHSIZE, --batchsize BATCHSIZE
                        Number of molecules to be treated simulateneously
  --prune [<max_dist>:<eps>]
                        Obtain the pruning indices for each conformer ensemble
  --nearest [NEAREST]   Obtain nearest atom indices for conformer ensembles.
                        This is necessary for indicator field descriptors.
                        Accepts up to 1 parameter which corresponds to the
                        cutoff distance.
  --overwrite           Overwrite the existing grid file
  --dtype DTYPE         Specify the data format to be used for grid parameter
                        storage.

List Names

ls allows access to a list of names in an existing conformer library or molecule library.

[24]:
!molli ls -h
usage: molli ls [-h] [-t {mlib,clib,cdxml}] [-a [ATTRIB ...]] input

Read a molli library and list its contents.

positional arguments:
  input                 Collection to inspect. If type is not specified, it
                        will be deduced from file extensions or directory
                        properties.

options:
  -h, --help            show this help message and exit
  -t {mlib,clib,cdxml}, --type {mlib,clib,cdxml}
                        Collection type
  -a [ATTRIB ...], --attrib [ATTRIB ...]
                        Attributes to report. At least one must be specified.
                        Attributes are accessed via `getattr` function.
                        Possible options: `n_atoms`, `n_bonds`,
                        `n_attachment_points`, `n_conformers`
                        `molecular_weight`, `formula`. If none specified, only
                        the indexes will be returned.

Parse

parse allows direct reading from a cdxml file to a molecule library. This by default does not perceive implicit hydrogens, but these can be added with the hadd option.

[25]:
!molli parse -h
usage: molli parse [-h] [-f {cdxml}] [-o <fpath>] [--hadd] [--overwrite] file

This package parses chemical files, such as .cdxml, and creates a collection
of molecules in .mlib format.

positional arguments:
  file                  File to be parsed.

options:
  -h, --help            show this help message and exit
  -f {cdxml}, --format {cdxml}
                        Override the source file format. Defaults to the file
                        extension. Supported types: 'cdxml'
  -o <fpath>, --output <fpath>
                        Destination for .MLIB output
  --hadd                Add implicit hydrogen atoms wherever possible. By
                        default this only affects elements in groups 13-17.
  --overwrite           Overwrite the target files if they exist (default is
                        false)

Recollect

recollect allows reading in of Molecule Library files, Conformer Library files, Zip Files, Molli 0.2 (Legacy) Zip Files, and Directories of molecules or conformer ensembles.

In the event that files outside of MOL2 or XYZ need to be read, one can use openbabel to leverage the interface molli has with this. Note: openbabel is not a dependency of molli and can be installed via conda install openbabel.

Example 1 Conformer Library to SDF Directory

molli recollect -it clib -i example.clib -p obabel -o example_sdf_dir -ot dir -oext sdf

This would read from the ConformerLibrary file using openbabel to parse this to create a directory “example_sdf_dir” which contains multi-SDF files based on the ConformerEnsemble objects in the `Conformer Library

Example 2 Zipfile to Molecule Library

molli recollect -it zip -i example_mol2s.zip -iext mol2 -p molli -o example.mlib -ot mlib

This would read from an existing zip file using molli to parse the files as MOL2. This would then be written to a Molecule Library file example.mlib.

[26]:
!molli recollect -h
usage: molli recollect [-h] [-i <PATH>] [-it {mlib,clib,dir,zip}]
                       [-iext INPUT_EXT] [-iconv {molecule,ensemble}]
                       [-o <PATH>] [-ot {mlib,clib,dir,zip}]
                       [-oext OUTPUT_EXT] [-l {molli,obabel,openbabel}]
                       [-cm 0 1] [-v] [-s] [--overwrite]

Read old style molli collection and convert it to the new file format.

options:
  -h, --help            show this help message and exit
  -i <PATH>, --input <PATH>
                        This is the input path
  -it {mlib,clib,dir,zip}, --input_type {mlib,clib,dir,zip}
                        This is the input type, including <mlib>, <.clib>,
                        <.zip>, <.xml>, <.ukv>, or directory (<dir>)
  -iext INPUT_EXT, --input_ext INPUT_EXT
                        This option is required if reading from a <zip> or
                        directory to indicate the File Type being searched for
                        (<mol2>, <xyz>, etc.)
  -iconv {molecule,ensemble}, --input_conv {molecule,ensemble}
                        This option is required if reading from a <zip> or
                        directory to indicate if the files being read should
                        be read in as a Molecule or ConformerEnsemble
  -o <PATH>, --output <PATH>
                        This is the output path
  -ot {mlib,clib,dir,zip}, --output_type {mlib,clib,dir,zip}
                        New style collection, either with or without
                        conformers
  -oext OUTPUT_EXT, --output_ext OUTPUT_EXT
                        This option is required if reading from a <zip> or
                        directory to indicate the File Type being searched for
                        (<mol2>, <xyz>, etc.)
  -l {molli,obabel,openbabel}, --library {molli,obabel,openbabel}
                        This indicates the type of library to utilize,
                        defaults to molli, but openbabel can be specified if
                        non xyz/mol2 formats are used. In the event a file
                        format without connectivity is utilized, such as xyz,
                        the molli parser will not create/perceive
                        connectivity, while the openbabel parser will
                        connect/perceive bond orders.
  -cm 0 1, --charge_mult 0 1
                        Assign these charge and multiplicity to the imported
                        molecules
  -v, --verbose         Increase the amount of output
  -s, --skip            This option enables skipping malformed files within
                        old collections. Warnings will be printed.
  --overwrite           This option enables overwriting the destination
                        collection.

Show

show allows visualization via pyvista of a molecule or a molecule within a molecule library via pyvista directly from the command line.

[27]:
!molli show -h
usage: molli show [-h] [-p PROGRAM] [-o OUTPUT] [-ot OTYPE]
                  [--bgcolor BGCOLOR] [--port PORT] [--parser PARSER]
                  [--no_confs]
                  library_or_mol [key]

Show a molecule in a GUI of choice

positional arguments:
  library_or_mol        This can be a molecule file or a Load all these
                        molecules from this library
  key                   Molecule to be shown. Only applies if the
                        `library_or_mol` argument is a molli collection.

options:
  -h, --help            show this help message and exit
  -p PROGRAM, --program PROGRAM
                        Run this command to get to gui. Special cases:
                        `pyvista`, `3dmol.js`, `http-3dmol.js`. Others are
                        interpreted as command path.
  -o OUTPUT, --output OUTPUT
                        If any temporary visualization files are producted,
                        they will be written in this destination. User is then
                        responsible for destrying those. If not specified,
                        temporary files will be created.
  -ot OTYPE, --otype OTYPE
                        Output temporary file type. defaults to `mol2`
  --bgcolor BGCOLOR     If the visualization software supports, set this color
                        as background color.
  --port PORT           If the visualization protocol requires to fire up a
                        server, this will be the port of choice.
  --parser PARSER       If the visualization requires to load an arbitrary
                        file, this parser will be used to parse out the file.
  --no_confs            Does not display all conformers of the molecule.

Stats

stats allows for various statistics to be calculated within molecule or conformer libraries using an “expression” associated with the local variable m. For example, if I wanted to get the statistics associated with the number of conformers in a conformer library, I could use

molli stats "m.n_conformers" example.clib -t clib

This returns not only the number of ensembles in the library, but the mean, standard deviation, minimum, IQR1, median, IQR3, and maximum.

[28]:
!molli stats -h
usage: molli stats [-h] [-t {mlib,clib}] [-o OUTPUT] expression input

Calculate statistics on the collection

positional arguments:
  expression            What to count. Expression is evaluated with the local
                        variable `m` that corresponds to the object.
  input                 Collection to inspect. If type is not specified, it
                        will be deduced from file extensions or directory
                        properties.

options:
  -h, --help            show this help message and exit
  -t {mlib,clib}, --type {mlib,clib}
                        Collection type
  -o OUTPUT, --output OUTPUT
                        Output the results as a space-separated file

Test

test runs all the unit tests available in molli. This will skip tests associated with openbabel and rdkit if they are not installed.

[29]:
!molli test -h
usage: molli test [-h] [-v] [-q] [--locals] [-f] [-c] [-b]
                  [-k TESTNAMEPATTERNS]
                  [tests ...]

positional arguments:
  tests                a list of any number of test modules, classes and test
                       methods.

options:
  -h, --help           show this help message and exit
  -v, --verbose        Verbose output
  -q, --quiet          Quiet output
  --locals             Show local variables in tracebacks
  -f, --failfast       Stop on first fail or error
  -c, --catch          Catch Ctrl-C and display results so far
  -b, --buffer         Buffer stdout and stderr during tests
  -k TESTNAMEPATTERNS  Only run tests which match the given substring

Examples:
  molli test                           - run default set of tests
  molli test MyTestSuite               - run suite 'MyTestSuite'
  molli test MyTestCase.testSomething  - run MyTestCase.testSomething
  molli test MyTestCase                - run all 'test*' test methods
                                       in MyTestCase

Map

[1]:
!molli map -h
usage: molli combine [-h] -t <lib> [-n 1] [-b 1] [-o <combined.mlib>]
                     [--overwrite]
                     script

Read a molli library and perform some basic inspections

positional arguments:
  script                This is a python file that defines a molli_main
                        function

options:
  -h, --help            show this help message and exit
  -t <lib>, --target <lib>
                        Target library that the function is going to be
                        applied to
  -n 1, --nprocs 1      Number of processes to be used in parallel
  -b 1, --batchsize 1   Number of molecules to be processed at a time on a
                        single core
  -o <combined.mlib>, --output <combined.mlib>
                        Output library
  --overwrite           Overwrite the target files if they exist (default is
                        false)
[ ]: