{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# `molli` Command Line \n", "\n", "## About this tutorial\n", "This file is meant to illustrate a few fundamental principles of the new molli package. The difference between old and new style molli is stark, therefore this introductory tutorial will be useful for both experienced people and newcomers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic structure of `molli` package.\n", "\n", "### Subpackages" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# This is meant to be as iconic as `import numpy as np` :)\n", "import molli as ml" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Command line\n", "\n", "`molli` features a number of standalone scripts for standard procedures, such as parsing a .CDXML file, or for compiling a collection." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: molli [-C ] [-L ] [-v] [-H] [-V]\n", " {list,align,combine,compile,gbca,grid,info,ls,parse,recollect,run,show,stats,test}\n", "\n", "MOLLI package is an API that intends to create a concise and easy-to-use\n", "syntax that encompasses the needs of cheminformatics (especially so, but not\n", "limited to the workflows developed and used in the Denmark laboratory.\n", "\n", "positional arguments:\n", " {list,align,combine,compile,gbca,grid,info,ls,parse,recollect,run,show,stats,test}\n", " This is main command that invokes a specific\n", " standalone routine in MOLLI. To get full explanation\n", " of available commands, run `molli list`\n", "\n", "options:\n", " -C , --CONFIG \n", " Sets the file from which molli configuration will be\n", " read from\n", " -L , --LOG \n", " Sets the file that will contain the output of molli\n", " routines.\n", " -v, --verbose Sets the level of verbosity for molli output.\n", " -H, --HELP show help message and exit\n", " -V, --VERSION show program's version number and exit\n" ] } ], "source": [ "# This is a shell command\n", "!molli --HELP" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32mmolli align\n", "\u001b[0m\u001b[32mmolli combine\n", "\u001b[0m\u001b[32mmolli compile\n", "\u001b[0m\u001b[32mmolli gbca\n", "\u001b[0m\u001b[32mmolli grid\n", "\u001b[0m\u001b[32mmolli info\n", "\u001b[0m\u001b[32mmolli ls\n", "\u001b[0m\u001b[32mmolli parse\n", "\u001b[0m\u001b[32mmolli recollect\n", "\u001b[0m\u001b[32mmolli show\n", "\u001b[0m\u001b[32mmolli stats\n", "\u001b[0m\u001b[32mmolli test\n", "\u001b[0m" ] } ], "source": [ "# This is a shell command\n", "!molli list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Align\n", "\n", "`align` allows for alignment of molecule libraries or conformer libraries based on a \"Query\" mol2 file. This can be a minimum substructure that exists within a library. Note: This requires the `rmsd` and `pandas` packages, which are currently not dependencies of molli. These can be added via `pip install rmsd` and `pip install pandas` OR `conda install rmsd` and `conda install pandas` respectively." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: molli align [-h] -i INPUT -q query_mol.mol2 [--rmsd {rmsd,scipy}]\n", " [-o ] [-s STATS]\n", "\n", "Read a conformer library and align it across given query\n", "\n", "options:\n", " -h, --help show this help message and exit\n", " -i INPUT, --input INPUT\n", " ConformerLibrary/MoleculeLibrary file to align\n", " -q query_mol.mol2, --query query_mol.mol2\n", " Mol2 file with the reference query structure\n", " --rmsd {rmsd,scipy} Method of rmsd calculation. Available are the default\n", " and scipy\n", " -o , --output \n", " Output file path and name w/o extension\n", " -s STATS, --stats STATS\n", " True/False flag to save alignment statistics in the\n", " separate file. Defaults to False.\n" ] } ], "source": [ "!molli align -h" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Combine\n", "\n", "`combine` allows combinatorial expansion of a library. One can view the `core` as being a full of base structures with certain attachment points. The substituents can be appended at different attachemnt points and with different methods depending on the values chosen." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: molli combine [-h] -s \n", " [-m {same,permutns,combns,combns_repl}]\n", " [-a ATTACHMENT_POINTS] [-n 1] [-b 1] -o \n", " [-sep SEPARATOR] [--hadd]\n", " [--obopt [ff maxiter tol disp ...]] [--overwrite]\n", " cores\n", "\n", "Combines two lists of molecules together\n", "\n", "positional arguments:\n", " cores Base library file to combine wth substituents\n", "\n", "options:\n", " -h, --help show this help message and exit\n", " -s , --substituents \n", " Substituents to add at each attachment of a core file\n", " -m {same,permutns,combns,combns_repl}, --mode {same,permutns,combns,combns_repl}\n", " Method for combining substituents\n", " -a ATTACHMENT_POINTS, --attachment_points ATTACHMENT_POINTS\n", " Label used to find attachment points\n", " -n 1, --nprocs 1 Number of processes to be used in parallel\n", " -b 1, --batchsize 1 Number of molecules to be processed at a time on a\n", " single core\n", " -o , --output \n", " File to be written to\n", " -sep SEPARATOR, --separator SEPARATOR\n", " Name separator\n", " --hadd Add implicit hydrogen atoms wherever possible.\n", " --obopt [ff maxiter tol disp ...]\n", " Perform openbabel optimization on the fly. This\n", " accepts up to 4 arguments. Arg 1: the forcefield\n", " (uff/mmff94/gaff/ghemical). Arg 2: is the max number\n", " of steps (default=500). Arg 3: energy convergence\n", " criterion (default=1e-4) Arg 4: geometry displacement\n", " (default=False) but values ~0.01-0.1 can help escape\n", " planarity.\n", " --overwrite Overwrite the target files if they exist (default is\n", " false)\n" ] } ], "source": [ "!molli combine -h" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compile\n", "\n", "`compile` allows multiple libraries to be combined into one." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: molli compile [-h] -o LIB_FILE [-t {molecule,ensemble}]\n", " [-p {openbabel,obabel,molli}] [--stem] [-s] [-v]\n", " [--overwrite]\n", " [ ...]\n", "\n", "Compile matching files into a molli collection. Both conformer libraries and\n", "molecule libraries are supported.\n", "\n", "positional arguments:\n", " List of source files or a glob pattern.\n", "\n", "options:\n", " -h, --help show this help message and exit\n", " -o LIB_FILE, --output LIB_FILE\n", " New style collection to be made\n", " -t {molecule,ensemble}, --type {molecule,ensemble}\n", " Type of object to be imported\n", " -p {openbabel,obabel,molli}, --parser {openbabel,obabel,molli}\n", " Parser to be used to import the molecule object\n", " --stem Renames the conformer ensemble to match the file stem\n", " -s, --split This is only compatible with the choice of type\n", " `molecule`. In this case all files are treated as\n", " multi-molecule files\n", " -v, --verbose Increase the amount of output\n", " --overwrite Overwrite the destination collection\n" ] } ], "source": [ "!molli compile -h" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## GBCA\n", "\n", "`gbca` allows calculation of some of the grid-based descriptors. A more in-depth description of the command and its applications can be found in the cookbook." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: molli gbca [-h] [-w] [-n 128] [-b 128] [-g ]\n", " [-o ] [--dtype DTYPE] [--overwrite]\n", " {aso,aeif} CLIB_FILE\n", "\n", "This module can be used for standalone computation of descriptors\n", "\n", "positional arguments:\n", " {aso,aeif} This selects the specific descriptor to compute.\n", " CLIB_FILE Conformer library to perform the calculation on\n", "\n", "options:\n", " -h, --help show this help message and exit\n", " -w, --weighted Apply the weights specified in the conformer files\n", " -n 128, --nprocs 128 Selects number of processors for python\n", " multiprocessing application. If the program is\n", " launched via MPI backend, this parameter is ignored.\n", " -b 128, --batchsize 128\n", " Number of conformer ensembles to be processed in one\n", " batch.\n", " -g , --grid \n", " File that contains the information about the\n", " gridpoints.\n", " -o , --output \n", " File that contains the information about the\n", " gridpoints.\n", " --dtype DTYPE Specify the data format to be used for grid parameter\n", " storage.\n", " --overwrite Overwrite the existing descriptor file\n" ] } ], "source": [ "!molli gbca -h" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Grid\n", "\n", "`grid` allows rectangular grid calculation of an existing molecule or conformer library with a variety of parameters. This is expanded upon in the cookbook." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: molli grid [-h] [-o ] [-n NPROCS] [-p 0.0] [-s 1.0]\n", " [-b BATCHSIZE] [--prune [:]]\n", " [--nearest [NEAREST]] [--overwrite] [--dtype DTYPE]\n", " library\n", "\n", "Read a molli library and calculate a grid\n", "\n", "positional arguments:\n", " library Conformer library file to perform the calculations on\n", "\n", "options:\n", " -h, --help show this help message and exit\n", " -o , --output \n", " Destination for calculation results\n", " -n NPROCS, --nprocs NPROCS\n", " Specifies the number of jobs for constructing a grid\n", " -p 0.0, --padding 0.0\n", " The bounding box will be padded by this many angstroms\n", " prior to grid construction\n", " -s 1.0, --spacing 1.0\n", " Intervals at which the grid points will be placed\n", " -b BATCHSIZE, --batchsize BATCHSIZE\n", " Number of molecules to be treated simulateneously\n", " --prune [:]\n", " Obtain the pruning indices for each conformer ensemble\n", " --nearest [NEAREST] Obtain nearest atom indices for conformer ensembles.\n", " This is necessary for indicator field descriptors.\n", " Accepts up to 1 parameter which corresponds to the\n", " cutoff distance.\n", " --overwrite Overwrite the existing grid file\n", " --dtype DTYPE Specify the data format to be used for grid parameter\n", " storage.\n" ] } ], "source": [ "!molli grid -h" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## List Names\n", "\n", "`ls` allows access to a list of names in an existing conformer library or molecule library." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: molli ls [-h] [-t {mlib,clib,cdxml}] [-a [ATTRIB ...]] input\n", "\n", "Read a molli library and list its contents.\n", "\n", "positional arguments:\n", " input Collection to inspect. If type is not specified, it\n", " will be deduced from file extensions or directory\n", " properties.\n", "\n", "options:\n", " -h, --help show this help message and exit\n", " -t {mlib,clib,cdxml}, --type {mlib,clib,cdxml}\n", " Collection type\n", " -a [ATTRIB ...], --attrib [ATTRIB ...]\n", " Attributes to report. At least one must be specified.\n", " Attributes are accessed via `getattr` function.\n", " Possible options: `n_atoms`, `n_bonds`,\n", " `n_attachment_points`, `n_conformers`\n", " `molecular_weight`, `formula`. If none specified, only\n", " the indexes will be returned.\n" ] } ], "source": [ "!molli ls -h" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Parse\n", "\n", "`parse` allows direct reading from a cdxml file to a molecule library. This by default does not perceive implicit hydrogens, but these can be added with the `hadd` option." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: molli parse [-h] [-f {cdxml}] [-o ] [--hadd] [--overwrite] file\n", "\n", "This package parses chemical files, such as .cdxml, and creates a collection\n", "of molecules in .mlib format.\n", "\n", "positional arguments:\n", " file File to be parsed.\n", "\n", "options:\n", " -h, --help show this help message and exit\n", " -f {cdxml}, --format {cdxml}\n", " Override the source file format. Defaults to the file\n", " extension. Supported types: 'cdxml'\n", " -o , --output \n", " Destination for .MLIB output\n", " --hadd Add implicit hydrogen atoms wherever possible. By\n", " default this only affects elements in groups 13-17.\n", " --overwrite Overwrite the target files if they exist (default is\n", " false)\n" ] } ], "source": [ "!molli parse -h" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Recollect\n", "\n", "`recollect` allows reading in of Molecule Library files, Conformer Library files, Zip Files, Molli 0.2 (Legacy) Zip Files, and Directories of molecules or conformer ensembles. \n", "\n", "In the event that files outside of `MOL2` or `XYZ` need to be read, one can use `openbabel` to leverage the interface `molli` has with this. Note: `openbabel` is not a dependency of `molli` and can be installed via `conda install openbabel`.\n", "\n", "### Example 1 Conformer Library to SDF Directory\n", "\n", "`molli recollect -it clib -i example.clib -p obabel -o example_sdf_dir -ot dir -oext sdf`\n", "\n", "This would read from the `ConformerLibrary` file using `openbabel` to parse this to create a directory \"example_sdf_dir\" which contains multi-SDF files based on the `ConformerEnsemble` objects in the `Conformer Library\n", "\n", "### Example 2 Zipfile to Molecule Library\n", "\n", "`molli recollect -it zip -i example_mol2s.zip -iext mol2 -p molli -o example.mlib -ot mlib`\n", "\n", "This would read from an existing zip file using `molli` to parse the files as `MOL2`. This would then be written to a Molecule Library file `example.mlib`. " ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: molli recollect [-h] [-i ] [-it {mlib,clib,dir,zip}]\n", " [-iext INPUT_EXT] [-iconv {molecule,ensemble}]\n", " [-o ] [-ot {mlib,clib,dir,zip}]\n", " [-oext OUTPUT_EXT] [-l {molli,obabel,openbabel}]\n", " [-cm 0 1] [-v] [-s] [--overwrite]\n", "\n", "Read old style molli collection and convert it to the new file format.\n", "\n", "options:\n", " -h, --help show this help message and exit\n", " -i , --input \n", " This is the input path\n", " -it {mlib,clib,dir,zip}, --input_type {mlib,clib,dir,zip}\n", " This is the input type, including , <.clib>,\n", " <.zip>, <.xml>, <.ukv>, or directory ()\n", " -iext INPUT_EXT, --input_ext INPUT_EXT\n", " This option is required if reading from a or\n", " directory to indicate the File Type being searched for\n", " (, , etc.)\n", " -iconv {molecule,ensemble}, --input_conv {molecule,ensemble}\n", " This option is required if reading from a or\n", " directory to indicate if the files being read should\n", " be read in as a Molecule or ConformerEnsemble\n", " -o , --output \n", " This is the output path\n", " -ot {mlib,clib,dir,zip}, --output_type {mlib,clib,dir,zip}\n", " New style collection, either with or without\n", " conformers\n", " -oext OUTPUT_EXT, --output_ext OUTPUT_EXT\n", " This option is required if reading from a or\n", " directory to indicate the File Type being searched for\n", " (, , etc.)\n", " -l {molli,obabel,openbabel}, --library {molli,obabel,openbabel}\n", " This indicates the type of library to utilize,\n", " defaults to molli, but openbabel can be specified if\n", " non xyz/mol2 formats are used. In the event a file\n", " format without connectivity is utilized, such as xyz,\n", " the molli parser will not create/perceive\n", " connectivity, while the openbabel parser will\n", " connect/perceive bond orders.\n", " -cm 0 1, --charge_mult 0 1\n", " Assign these charge and multiplicity to the imported\n", " molecules\n", " -v, --verbose Increase the amount of output\n", " -s, --skip This option enables skipping malformed files within\n", " old collections. Warnings will be printed.\n", " --overwrite This option enables overwriting the destination\n", " collection.\n" ] } ], "source": [ "!molli recollect -h" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Show\n", "\n", "`show` allows visualization via pyvista of a molecule or a molecule within a molecule library via pyvista directly from the command line." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: molli show [-h] [-p PROGRAM] [-o OUTPUT] [-ot OTYPE]\n", " [--bgcolor BGCOLOR] [--port PORT] [--parser PARSER]\n", " [--no_confs]\n", " library_or_mol [key]\n", "\n", "Show a molecule in a GUI of choice\n", "\n", "positional arguments:\n", " library_or_mol This can be a molecule file or a Load all these\n", " molecules from this library\n", " key Molecule to be shown. Only applies if the\n", " `library_or_mol` argument is a molli collection.\n", "\n", "options:\n", " -h, --help show this help message and exit\n", " -p PROGRAM, --program PROGRAM\n", " Run this command to get to gui. Special cases:\n", " `pyvista`, `3dmol.js`, `http-3dmol.js`. Others are\n", " interpreted as command path.\n", " -o OUTPUT, --output OUTPUT\n", " If any temporary visualization files are producted,\n", " they will be written in this destination. User is then\n", " responsible for destrying those. If not specified,\n", " temporary files will be created.\n", " -ot OTYPE, --otype OTYPE\n", " Output temporary file type. defaults to `mol2`\n", " --bgcolor BGCOLOR If the visualization software supports, set this color\n", " as background color.\n", " --port PORT If the visualization protocol requires to fire up a\n", " server, this will be the port of choice.\n", " --parser PARSER If the visualization requires to load an arbitrary\n", " file, this parser will be used to parse out the file.\n", " --no_confs Does not display all conformers of the molecule.\n" ] } ], "source": [ "!molli show -h" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Stats\n", "\n", "`stats` allows for various statistics to be calculated within molecule or conformer libraries using an \"expression\" associated with the local variable `m`. For example, if I wanted to get the statistics associated with the number of conformers in a conformer library, I could use\n", "\n", "`molli stats \"m.n_conformers\" example.clib -t clib`\n", "\n", "This returns not only the number of ensembles in the library, but the mean, standard deviation, minimum, IQR1, median, IQR3, and maximum." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: molli stats [-h] [-t {mlib,clib}] [-o OUTPUT] expression input\n", "\n", "Calculate statistics on the collection\n", "\n", "positional arguments:\n", " expression What to count. Expression is evaluated with the local\n", " variable `m` that corresponds to the object.\n", " input Collection to inspect. If type is not specified, it\n", " will be deduced from file extensions or directory\n", " properties.\n", "\n", "options:\n", " -h, --help show this help message and exit\n", " -t {mlib,clib}, --type {mlib,clib}\n", " Collection type\n", " -o OUTPUT, --output OUTPUT\n", " Output the results as a space-separated file\n" ] } ], "source": [ "!molli stats -h " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Test\n", "\n", "`test` runs all the unit tests available in `molli`. This will skip tests associated with `openbabel` and `rdkit` if they are not installed." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: molli test [-h] [-v] [-q] [--locals] [-f] [-c] [-b]\n", " [-k TESTNAMEPATTERNS]\n", " [tests ...]\n", "\n", "positional arguments:\n", " tests a list of any number of test modules, classes and test\n", " methods.\n", "\n", "options:\n", " -h, --help show this help message and exit\n", " -v, --verbose Verbose output\n", " -q, --quiet Quiet output\n", " --locals Show local variables in tracebacks\n", " -f, --failfast Stop on first fail or error\n", " -c, --catch Catch Ctrl-C and display results so far\n", " -b, --buffer Buffer stdout and stderr during tests\n", " -k TESTNAMEPATTERNS Only run tests which match the given substring\n", "\n", "Examples:\n", " molli test - run default set of tests\n", " molli test MyTestSuite - run suite 'MyTestSuite'\n", " molli test MyTestCase.testSomething - run MyTestCase.testSomething\n", " molli test MyTestCase - run all 'test*' test methods\n", " in MyTestCase\n", "\n" ] } ], "source": [ "!molli test -h" ] } ], "metadata": { "kernelspec": { "display_name": "dev-blake", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.6" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }