molli Command Line¶
About this tutorial¶
This file is meant to illustrate a few fundamental principles of the new molli package. The difference between old and new style molli is stark, therefore this introductory tutorial will be useful for both experienced people and newcomers.
Basic structure of molli package.¶
Subpackages¶
[16]:
# This is meant to be as iconic as `import numpy as np` :)
import molli as ml
Command line¶
molli features a number of standalone scripts for standard procedures, such as parsing a .CDXML file, or for compiling a collection.
[17]:
# This is a shell command
!molli --HELP
usage: molli [-C <file.yml>] [-L <file.log>] [-v] [-H] [-V]
{list,align,combine,compile,gbca,grid,info,ls,parse,recollect,run,show,stats,test}
MOLLI package is an API that intends to create a concise and easy-to-use
syntax that encompasses the needs of cheminformatics (especially so, but not
limited to the workflows developed and used in the Denmark laboratory.
positional arguments:
{list,align,combine,compile,gbca,grid,info,ls,parse,recollect,run,show,stats,test}
This is main command that invokes a specific
standalone routine in MOLLI. To get full explanation
of available commands, run `molli list`
options:
-C <file.yml>, --CONFIG <file.yml>
Sets the file from which molli configuration will be
read from
-L <file.log>, --LOG <file.log>
Sets the file that will contain the output of molli
routines.
-v, --verbose Sets the level of verbosity for molli output.
-H, --HELP show help message and exit
-V, --VERSION show program's version number and exit
[18]:
# This is a shell command
!molli list
molli align
molli combine
molli compile
molli gbca
molli grid
molli info
molli ls
molli parse
molli recollect
molli show
molli stats
molli test
Align¶
align allows for alignment of molecule libraries or conformer libraries based on a “Query” mol2 file. This can be a minimum substructure that exists within a library. Note: This requires the rmsd and pandas packages, which are currently not dependencies of molli. These can be added via pip install rmsd and pip install pandas OR conda install rmsd and conda install pandas respectively.
[19]:
!molli align -h
usage: molli align [-h] -i INPUT -q query_mol.mol2 [--rmsd {rmsd,scipy}]
[-o <aligned>] [-s STATS]
Read a conformer library and align it across given query
options:
-h, --help show this help message and exit
-i INPUT, --input INPUT
ConformerLibrary/MoleculeLibrary file to align
-q query_mol.mol2, --query query_mol.mol2
Mol2 file with the reference query structure
--rmsd {rmsd,scipy} Method of rmsd calculation. Available are the default
and scipy
-o <aligned>, --output <aligned>
Output file path and name w/o extension
-s STATS, --stats STATS
True/False flag to save alignment statistics in the
separate file. Defaults to False.
Combine¶
combine allows combinatorial expansion of a library. One can view the core as being a full of base structures with certain attachment points. The substituents can be appended at different attachemnt points and with different methods depending on the values chosen.
[20]:
!molli combine -h
usage: molli combine [-h] -s <substituents.mlib>
[-m {same,permutns,combns,combns_repl}]
[-a ATTACHMENT_POINTS] [-n 1] [-b 1] -o <combined.mlib>
[-sep SEPARATOR] [--hadd]
[--obopt [ff maxiter tol disp ...]] [--overwrite]
cores
Combines two lists of molecules together
positional arguments:
cores Base library file to combine wth substituents
options:
-h, --help show this help message and exit
-s <substituents.mlib>, --substituents <substituents.mlib>
Substituents to add at each attachment of a core file
-m {same,permutns,combns,combns_repl}, --mode {same,permutns,combns,combns_repl}
Method for combining substituents
-a ATTACHMENT_POINTS, --attachment_points ATTACHMENT_POINTS
Label used to find attachment points
-n 1, --nprocs 1 Number of processes to be used in parallel
-b 1, --batchsize 1 Number of molecules to be processed at a time on a
single core
-o <combined.mlib>, --output <combined.mlib>
File to be written to
-sep SEPARATOR, --separator SEPARATOR
Name separator
--hadd Add implicit hydrogen atoms wherever possible.
--obopt [ff maxiter tol disp ...]
Perform openbabel optimization on the fly. This
accepts up to 4 arguments. Arg 1: the forcefield
(uff/mmff94/gaff/ghemical). Arg 2: is the max number
of steps (default=500). Arg 3: energy convergence
criterion (default=1e-4) Arg 4: geometry displacement
(default=False) but values ~0.01-0.1 can help escape
planarity.
--overwrite Overwrite the target files if they exist (default is
false)
Compile¶
compile allows multiple libraries to be combined into one.
[21]:
!molli compile -h
usage: molli compile [-h] -o LIB_FILE [-t {molecule,ensemble}]
[-p {openbabel,obabel,molli}] [--stem] [-s] [-v]
[--overwrite]
[<file_or_glob> ...]
Compile matching files into a molli collection. Both conformer libraries and
molecule libraries are supported.
positional arguments:
<file_or_glob> List of source files or a glob pattern.
options:
-h, --help show this help message and exit
-o LIB_FILE, --output LIB_FILE
New style collection to be made
-t {molecule,ensemble}, --type {molecule,ensemble}
Type of object to be imported
-p {openbabel,obabel,molli}, --parser {openbabel,obabel,molli}
Parser to be used to import the molecule object
--stem Renames the conformer ensemble to match the file stem
-s, --split This is only compatible with the choice of type
`molecule`. In this case all files are treated as
multi-molecule files
-v, --verbose Increase the amount of output
--overwrite Overwrite the destination collection
GBCA¶
gbca allows calculation of some of the grid-based descriptors. A more in-depth description of the command and its applications can be found in the cookbook.
[22]:
!molli gbca -h
usage: molli gbca [-h] [-w] [-n 128] [-b 128] [-g <grid.hdf5>]
[-o <lib_aso.hdf5>] [--dtype DTYPE] [--overwrite]
{aso,aeif} CLIB_FILE
This module can be used for standalone computation of descriptors
positional arguments:
{aso,aeif} This selects the specific descriptor to compute.
CLIB_FILE Conformer library to perform the calculation on
options:
-h, --help show this help message and exit
-w, --weighted Apply the weights specified in the conformer files
-n 128, --nprocs 128 Selects number of processors for python
multiprocessing application. If the program is
launched via MPI backend, this parameter is ignored.
-b 128, --batchsize 128
Number of conformer ensembles to be processed in one
batch.
-g <grid.hdf5>, --grid <grid.hdf5>
File that contains the information about the
gridpoints.
-o <lib_aso.hdf5>, --output <lib_aso.hdf5>
File that contains the information about the
gridpoints.
--dtype DTYPE Specify the data format to be used for grid parameter
storage.
--overwrite Overwrite the existing descriptor file
Grid¶
grid allows rectangular grid calculation of an existing molecule or conformer library with a variety of parameters. This is expanded upon in the cookbook.
[23]:
!molli grid -h
usage: molli grid [-h] [-o <fpath>] [-n NPROCS] [-p 0.0] [-s 1.0]
[-b BATCHSIZE] [--prune [<max_dist>:<eps>]]
[--nearest [NEAREST]] [--overwrite] [--dtype DTYPE]
library
Read a molli library and calculate a grid
positional arguments:
library Conformer library file to perform the calculations on
options:
-h, --help show this help message and exit
-o <fpath>, --output <fpath>
Destination for calculation results
-n NPROCS, --nprocs NPROCS
Specifies the number of jobs for constructing a grid
-p 0.0, --padding 0.0
The bounding box will be padded by this many angstroms
prior to grid construction
-s 1.0, --spacing 1.0
Intervals at which the grid points will be placed
-b BATCHSIZE, --batchsize BATCHSIZE
Number of molecules to be treated simulateneously
--prune [<max_dist>:<eps>]
Obtain the pruning indices for each conformer ensemble
--nearest [NEAREST] Obtain nearest atom indices for conformer ensembles.
This is necessary for indicator field descriptors.
Accepts up to 1 parameter which corresponds to the
cutoff distance.
--overwrite Overwrite the existing grid file
--dtype DTYPE Specify the data format to be used for grid parameter
storage.
List Names¶
ls allows access to a list of names in an existing conformer library or molecule library.
[24]:
!molli ls -h
usage: molli ls [-h] [-t {mlib,clib,cdxml}] [-a [ATTRIB ...]] input
Read a molli library and list its contents.
positional arguments:
input Collection to inspect. If type is not specified, it
will be deduced from file extensions or directory
properties.
options:
-h, --help show this help message and exit
-t {mlib,clib,cdxml}, --type {mlib,clib,cdxml}
Collection type
-a [ATTRIB ...], --attrib [ATTRIB ...]
Attributes to report. At least one must be specified.
Attributes are accessed via `getattr` function.
Possible options: `n_atoms`, `n_bonds`,
`n_attachment_points`, `n_conformers`
`molecular_weight`, `formula`. If none specified, only
the indexes will be returned.
Parse¶
parse allows direct reading from a cdxml file to a molecule library. This by default does not perceive implicit hydrogens, but these can be added with the hadd option.
[25]:
!molli parse -h
usage: molli parse [-h] [-f {cdxml}] [-o <fpath>] [--hadd] [--overwrite] file
This package parses chemical files, such as .cdxml, and creates a collection
of molecules in .mlib format.
positional arguments:
file File to be parsed.
options:
-h, --help show this help message and exit
-f {cdxml}, --format {cdxml}
Override the source file format. Defaults to the file
extension. Supported types: 'cdxml'
-o <fpath>, --output <fpath>
Destination for .MLIB output
--hadd Add implicit hydrogen atoms wherever possible. By
default this only affects elements in groups 13-17.
--overwrite Overwrite the target files if they exist (default is
false)
Recollect¶
recollect allows reading in of Molecule Library files, Conformer Library files, Zip Files, Molli 0.2 (Legacy) Zip Files, and Directories of molecules or conformer ensembles.
In the event that files outside of MOL2 or XYZ need to be read, one can use openbabel to leverage the interface molli has with this. Note: openbabel is not a dependency of molli and can be installed via conda install openbabel.
Example 1 Conformer Library to SDF Directory¶
molli recollect -it clib -i example.clib -p obabel -o example_sdf_dir -ot dir -oext sdf
This would read from the ConformerLibrary file using openbabel to parse this to create a directory “example_sdf_dir” which contains multi-SDF files based on the ConformerEnsemble objects in the `Conformer Library
Example 2 Zipfile to Molecule Library¶
molli recollect -it zip -i example_mol2s.zip -iext mol2 -p molli -o example.mlib -ot mlib
This would read from an existing zip file using molli to parse the files as MOL2. This would then be written to a Molecule Library file example.mlib.
[26]:
!molli recollect -h
usage: molli recollect [-h] [-i <PATH>] [-it {mlib,clib,dir,zip}]
[-iext INPUT_EXT] [-iconv {molecule,ensemble}]
[-o <PATH>] [-ot {mlib,clib,dir,zip}]
[-oext OUTPUT_EXT] [-l {molli,obabel,openbabel}]
[-cm 0 1] [-v] [-s] [--overwrite]
Read old style molli collection and convert it to the new file format.
options:
-h, --help show this help message and exit
-i <PATH>, --input <PATH>
This is the input path
-it {mlib,clib,dir,zip}, --input_type {mlib,clib,dir,zip}
This is the input type, including <mlib>, <.clib>,
<.zip>, <.xml>, <.ukv>, or directory (<dir>)
-iext INPUT_EXT, --input_ext INPUT_EXT
This option is required if reading from a <zip> or
directory to indicate the File Type being searched for
(<mol2>, <xyz>, etc.)
-iconv {molecule,ensemble}, --input_conv {molecule,ensemble}
This option is required if reading from a <zip> or
directory to indicate if the files being read should
be read in as a Molecule or ConformerEnsemble
-o <PATH>, --output <PATH>
This is the output path
-ot {mlib,clib,dir,zip}, --output_type {mlib,clib,dir,zip}
New style collection, either with or without
conformers
-oext OUTPUT_EXT, --output_ext OUTPUT_EXT
This option is required if reading from a <zip> or
directory to indicate the File Type being searched for
(<mol2>, <xyz>, etc.)
-l {molli,obabel,openbabel}, --library {molli,obabel,openbabel}
This indicates the type of library to utilize,
defaults to molli, but openbabel can be specified if
non xyz/mol2 formats are used. In the event a file
format without connectivity is utilized, such as xyz,
the molli parser will not create/perceive
connectivity, while the openbabel parser will
connect/perceive bond orders.
-cm 0 1, --charge_mult 0 1
Assign these charge and multiplicity to the imported
molecules
-v, --verbose Increase the amount of output
-s, --skip This option enables skipping malformed files within
old collections. Warnings will be printed.
--overwrite This option enables overwriting the destination
collection.
Show¶
show allows visualization via pyvista of a molecule or a molecule within a molecule library via pyvista directly from the command line.
[27]:
!molli show -h
usage: molli show [-h] [-p PROGRAM] [-o OUTPUT] [-ot OTYPE]
[--bgcolor BGCOLOR] [--port PORT] [--parser PARSER]
[--no_confs]
library_or_mol [key]
Show a molecule in a GUI of choice
positional arguments:
library_or_mol This can be a molecule file or a Load all these
molecules from this library
key Molecule to be shown. Only applies if the
`library_or_mol` argument is a molli collection.
options:
-h, --help show this help message and exit
-p PROGRAM, --program PROGRAM
Run this command to get to gui. Special cases:
`pyvista`, `3dmol.js`, `http-3dmol.js`. Others are
interpreted as command path.
-o OUTPUT, --output OUTPUT
If any temporary visualization files are producted,
they will be written in this destination. User is then
responsible for destrying those. If not specified,
temporary files will be created.
-ot OTYPE, --otype OTYPE
Output temporary file type. defaults to `mol2`
--bgcolor BGCOLOR If the visualization software supports, set this color
as background color.
--port PORT If the visualization protocol requires to fire up a
server, this will be the port of choice.
--parser PARSER If the visualization requires to load an arbitrary
file, this parser will be used to parse out the file.
--no_confs Does not display all conformers of the molecule.
Stats¶
stats allows for various statistics to be calculated within molecule or conformer libraries using an “expression” associated with the local variable m. For example, if I wanted to get the statistics associated with the number of conformers in a conformer library, I could use
molli stats "m.n_conformers" example.clib -t clib
This returns not only the number of ensembles in the library, but the mean, standard deviation, minimum, IQR1, median, IQR3, and maximum.
[28]:
!molli stats -h
usage: molli stats [-h] [-t {mlib,clib}] [-o OUTPUT] expression input
Calculate statistics on the collection
positional arguments:
expression What to count. Expression is evaluated with the local
variable `m` that corresponds to the object.
input Collection to inspect. If type is not specified, it
will be deduced from file extensions or directory
properties.
options:
-h, --help show this help message and exit
-t {mlib,clib}, --type {mlib,clib}
Collection type
-o OUTPUT, --output OUTPUT
Output the results as a space-separated file
Test¶
test runs all the unit tests available in molli. This will skip tests associated with openbabel and rdkit if they are not installed.
[29]:
!molli test -h
usage: molli test [-h] [-v] [-q] [--locals] [-f] [-c] [-b]
[-k TESTNAMEPATTERNS]
[tests ...]
positional arguments:
tests a list of any number of test modules, classes and test
methods.
options:
-h, --help show this help message and exit
-v, --verbose Verbose output
-q, --quiet Quiet output
--locals Show local variables in tracebacks
-f, --failfast Stop on first fail or error
-c, --catch Catch Ctrl-C and display results so far
-b, --buffer Buffer stdout and stderr during tests
-k TESTNAMEPATTERNS Only run tests which match the given substring
Examples:
molli test - run default set of tests
molli test MyTestSuite - run suite 'MyTestSuite'
molli test MyTestCase.testSomething - run MyTestCase.testSomething
molli test MyTestCase - run all 'test*' test methods
in MyTestCase
Map¶
[1]:
!molli map -h
usage: molli combine [-h] -t <lib> [-n 1] [-b 1] [-o <combined.mlib>]
[--overwrite]
script
Read a molli library and perform some basic inspections
positional arguments:
script This is a python file that defines a molli_main
function
options:
-h, --help show this help message and exit
-t <lib>, --target <lib>
Target library that the function is going to be
applied to
-n 1, --nprocs 1 Number of processes to be used in parallel
-b 1, --batchsize 1 Number of molecules to be processed at a time on a
single core
-o <combined.mlib>, --output <combined.mlib>
Output library
--overwrite Overwrite the target files if they exist (default is
false)
[ ]: