Welcome to molli¶
About this tutorial¶
This file is meant to illustrate a few fundamental principles of the new molli package. The difference between old and new style molli is stark, therefore this introductory tutorial will be useful for both experienced people and newcomers.
Basic structure of molli package.¶
Subpackages¶
[1]:
# This is meant to be as iconic as `import numpy as np` :)
import molli as ml
Command line¶
molli features a number of standalone scripts for standard procedures, such as parsing a .CDXML file, or for compiling a collection.
[2]:
# This is a shell command
!molli --HELP
usage: molli [-C <file.yml>] [-L <file.log>] [-v] [-H] [-V]
{list,align,combine,compile,gbca,grid,info,ls,parse,recollect,run,show,stats,test}
MOLLI package is an API that intends to create a concise and easy-to-use
syntax that encompasses the needs of cheminformatics (especially so, but not
limited to the workflows developed and used in the Denmark laboratory.
positional arguments:
{list,align,combine,compile,gbca,grid,info,ls,parse,recollect,run,show,stats,test}
This is main command that invokes a specific
standalone routine in MOLLI. To get full explanation
of available commands, run `molli list`
options:
-C <file.yml>, --CONFIG <file.yml>
Sets the file from which molli configuration will be
read from
-L <file.log>, --LOG <file.log>
Sets the file that will contain the output of molli
routines.
-v, --verbose Sets the level of verbosity for molli output.
-H, --HELP show help message and exit
-V, --VERSION show program's version number and exit
[3]:
# This is a shell command
!molli list
molli combine
molli compile
molli gbca
molli grid
molli info
molli ls
molli parse
molli recollect
molli show
molli stats
molli test
Basic objects¶
Molli features classes that are meant to construct and represent arbitrary chemical entities. They can be constructed completely programmatically or by importing the data from a saved file.
Molecule¶
Molecules can be instantiated in a few key ways, here is an example of two ways to load a mol2 file:
# This function imports a mol2 file from a string
mol = ml.Molecule.loads_mol2(mol2_string)
# or, similarly, from a file stream
mol = ml.Molecule.load_mol2(file_io)
# or file path
mol = ml.Molecule.load_mol2(file_path)
Here is an example of this in action:
[4]:
#Example file path available within molli
fpath = ml.files.benzene_mol2
print("Path to a test file", fpath)
#Loads a molecule object from the file path
mol = ml.Molecule.load_mol2(fpath)
print(f'This is the Molecule: {mol}')
print("Here is the molecule as an XYZ File")
print(mol.dumps_xyz())
Path to a test file /home/blakeo2/new_molli/molli_dev/molli/molli/files/benzene.mol2
This is the Molecule: Molecule(name='benzene', formula='C6 H6')
Here is the molecule as an XYZ File
12
benzene
C -2.424200 1.134800 -0.000000
C -3.698300 0.567200 -0.000000
C -3.843800 -0.820000 -0.000000
C -2.715200 -1.639600 -0.000000
C -1.441100 -1.072000 -0.000000
C -1.295600 0.315200 -0.000000
H -2.310800 2.215600 -0.000000
H -4.577600 1.205800 -0.000000
H -4.836500 -1.262200 -0.000000
H -2.828600 -2.720400 -0.000000
H -0.561800 -1.710600 0.000000
H -0.302900 0.757400 0.000000
molli is natively built to read in objects from three distinct formats:
SYBYL_MOL2
XYZ (this will not automatically perceive bonds/connectivity!)
CDXML (this will not automatically perceive hydrogens!)
OpenBabel is an essential tool in cheminformatics, uniting many formats under one unified molecular structure, OBMol, that can easily be converted between various file formats. We have designed an interface between molli and Openbabel that allows imports from almost any known chemical format into molli.
Openbabel is not a necessary dependency however, and would need to be independently installed to leverage this functionality (https://github.com/openbabel/openbabel)
An example with the mol format is shown below
[5]:
file_path = ml.files.dendrobine_molv3
#This loads the MOLFILE into Molli using openbabel
mol = ml.load(file_path, fmt='mol', parser='openbabel', otype="molecule", name='dendrobine')
mol
[5]:
Molecule(name='dendrobine', formula='C16 H25 N1 O2')
ConformerEnsemble¶
This is a fundamental class of molli that can be thought of as a collection of varying coordinates with a baseline set of atoms and bonds. These can be loaded in very similar fashion to Molecule objects:
# This function imports a mol2 file from a string
ens = ml.ConformerEnsemble.loads_mol2(mol2_string)
# or, similarly, from a file stream
ens = ml.ConformerEnsemble.load_mol2(file_io)
# or file path
ens = ml.ConformerEnsemble.load_mol2(file_path)
Here is an example of this in action
[6]:
file_path = ml.files.pentane_confs_mol2
ens = ml.load(file_path, fmt='mol2', otype='ensemble')
print(f'Here is the ensemble: {ens}')
print('Here are the XYZ coordinates of the full ensemble')
print(ens.dumps_xyz())
Here is the ensemble: ConformerEnsemble(name='pentane', formula='C5 H12', n_conformers=7)
Here are the XYZ coordinates of the full ensemble
17
pentane
C -2.804500 3.996400 -1.412800
C -2.748400 3.317400 -0.053600
H -3.684000 4.644600 -1.476700
H -2.867800 3.257600 -2.218100
H -1.915400 4.612600 -1.580600
C -1.528800 2.404000 0.066300
H -3.665500 2.735900 0.095900
H -2.718500 4.083500 0.729900
C -0.228600 3.184600 -0.124600
H -1.592000 1.606100 -0.683800
H -1.526500 1.921300 1.051200
C -0.089200 4.294900 0.904600
H -0.200100 3.620600 -1.130000
H 0.628700 2.506600 -0.040400
H -0.915000 5.009100 0.825500
H 0.847200 4.839900 0.749600
H -0.081700 3.888900 1.921200
17
pentane
C -2.729800 4.412900 1.000500
C -2.748400 3.317400 -0.053600
H -3.610700 5.054100 0.896400
H -1.838700 5.040500 0.898400
H -2.736800 3.987100 2.009000
C -1.528800 2.404000 0.066300
H -2.773500 3.776500 -1.048800
H -3.666800 2.728900 0.055600
C -0.228600 3.184600 -0.124600
H -1.592000 1.606100 -0.683800
H -1.526500 1.921300 1.051200
C -0.162700 3.829400 -1.499900
H 0.629900 2.514400 -0.000300
H -0.145900 3.963600 0.642100
H -0.210400 3.073600 -2.290300
H 0.774800 4.382300 -1.614700
H -0.990700 4.530900 -1.644300
17
pentane
C -4.043200 2.548300 0.154600
C -2.748400 3.317400 -0.053600
H -4.902100 3.220700 0.064500
H -4.072500 2.091200 1.148800
H -4.152800 1.755200 -0.592000
C -1.528800 2.404000 0.066300
H -2.685200 4.122400 0.687800
H -2.765200 3.788400 -1.043400
C -0.228600 3.184600 -0.124600
H -1.592000 1.606100 -0.683800
H -1.526500 1.921300 1.051200
C 0.986600 2.276700 -0.021500
H -0.153200 3.973500 0.632700
H -0.229000 3.674800 -1.105100
H 1.034400 1.795200 0.960400
H 1.905000 2.855400 -0.161200
H 0.958100 1.494700 -0.787100
17
pentane
C -4.043200 2.548300 0.154600
C -2.748400 3.317400 -0.053600
H -4.902100 3.220700 0.064500
H -4.072500 2.091200 1.148800
H -4.152800 1.755200 -0.592000
C -1.528800 2.404000 0.066300
H -2.685200 4.122400 0.687800
H -2.765200 3.788400 -1.043400
C -0.228600 3.184600 -0.124600
H -1.592000 1.606100 -0.683800
H -1.526500 1.921300 1.051200
C -0.089200 4.294900 0.904600
H -0.200100 3.620600 -1.130000
H 0.628700 2.506600 -0.040400
H -0.915000 5.009100 0.825500
H 0.847200 4.839900 0.749600
H -0.081700 3.888900 1.921200
17
pentane
C -4.043200 2.548300 0.154600
C -2.748400 3.317400 -0.053600
H -4.902100 3.220700 0.064500
H -4.072500 2.091200 1.148800
H -4.152800 1.755200 -0.592000
C -1.528800 2.404000 0.066300
H -2.685200 4.122400 0.687800
H -2.765200 3.788400 -1.043400
C -0.228600 3.184600 -0.124600
H -1.592000 1.606100 -0.683800
H -1.526500 1.921300 1.051200
C -0.162700 3.829400 -1.499900
H 0.629900 2.514400 -0.000300
H -0.145900 3.963600 0.642100
H -0.210400 3.073600 -2.290300
H 0.774800 4.382300 -1.614700
H -0.990700 4.530900 -1.644300
17
pentane
C -2.804500 3.996400 -1.412800
C -2.748400 3.317400 -0.053600
H -3.684000 4.644600 -1.476700
H -2.867800 3.257600 -2.218100
H -1.915400 4.612600 -1.580600
C -1.528800 2.404000 0.066300
H -3.665500 2.735900 0.095900
H -2.718500 4.083500 0.729900
C -0.228600 3.184600 -0.124600
H -1.592000 1.606100 -0.683800
H -1.526500 1.921300 1.051200
C 0.986600 2.276700 -0.021500
H -0.153200 3.973500 0.632700
H -0.229000 3.674800 -1.105100
H 1.034400 1.795200 0.960400
H 1.905000 2.855400 -0.161200
H 0.958100 1.494700 -0.787100
17
pentane
C -2.729800 4.412900 1.000500
C -2.748400 3.317400 -0.053600
H -3.610700 5.054100 0.896400
H -1.838700 5.040500 0.898400
H -2.736800 3.987100 2.009000
C -1.528800 2.404000 0.066300
H -2.773500 3.776500 -1.048800
H -3.666800 2.728900 0.055600
C -0.228600 3.184600 -0.124600
H -1.592000 1.606100 -0.683800
H -1.526500 1.921300 1.051200
C 0.986600 2.276700 -0.021500
H -0.153200 3.973500 0.632700
H -0.229000 3.674800 -1.105100
H 1.034400 1.795200 0.960400
H 1.905000 2.855400 -0.161200
H 0.958100 1.494700 -0.787100
In addition, conformer ensembles can be instantiated from a list of mols
[7]:
file_path = ml.files.pentane_confs_mol2
mols = ml.load_all(file_path, otype='molecule')
print(mols)
ens = ml.ConformerEnsemble(mols)
print(ens.n_conformers)
print(f'Here is the ensemble: {ens}')
[Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12')]
7
Here is the ensemble: ConformerEnsemble(name='pentane', formula='C5 H12', n_conformers=7)
Substructure¶
This class pulls atoms and bonds from the parent structure, and allows for manipulation of a subset of atoms within the initial structure
[8]:
file_path = ml.files.dendrobine_mol2
dendrobine = ml.load(file_path, otype='molecule', name='dendrobine')
sub = ml.Substructure(dendrobine, (1, 3, 5))
print(sub.dumps_xyz())
sub.coords += 50.0
print(dendrobine.dumps_xyz())
3
<class 'molli.chem.structure.Substructure'>
C 0.057300 -0.022600 2.122700
C -0.428400 -0.411300 -0.168700
C 2.456200 0.441700 1.836100
44
dendrobine
N 1.296000 -0.231900 1.267000
C 50.057300 49.977400 52.122700
C -1.097400 -0.473800 1.205900
C 49.571600 49.588700 49.831300
C 0.868300 0.359800 -0.000400
C 52.456200 50.441700 51.836100
C -2.432800 0.265900 0.983200
C -2.655300 0.247200 -0.580100
C -1.242800 0.562600 -1.016300
C 1.535300 0.579000 -1.417900
O 1.341000 2.036300 -1.650700
C 0.078200 2.229100 -2.216600
C -0.597700 0.853900 -2.402000
O -0.413500 3.318000 -2.455200
C 0.760200 0.167800 -2.700100
C 0.781200 -1.214100 -3.377300
H 1.242600 0.768400 -3.498200
H -1.179400 1.558000 -0.537700
H 0.651600 1.395200 0.317400
H -1.372200 -1.491000 1.513400
C -0.282800 -1.883500 -0.595700
C 2.177500 -1.841200 -3.321500
C 0.348500 -1.100600 -4.851400
H 0.059000 -1.900900 -2.947800
H -0.064900 1.026400 2.419800
H 0.116200 -0.640800 3.024400
H 2.715000 -0.004900 2.801900
H 2.292700 1.514700 1.987400
H 3.325500 0.316200 1.181800
H -2.371600 1.296200 1.353400
H -3.263900 -0.226100 1.498200
H -3.381000 1.006900 -0.883600
H -2.993400 -0.735800 -0.922200
H 2.608800 0.379800 -1.439400
H -1.292900 0.877200 -3.242300
H -0.307500 -2.585400 0.247500
H 0.692000 -2.089000 -1.031900
H -1.094300 -2.190400 -1.261400
H 2.179500 -2.824100 -3.804900
H 2.517600 -1.982200 -2.291400
H 2.914300 -1.213500 -3.834300
H 0.352200 -2.085900 -5.330700
H 1.023200 -0.452000 -5.421000
H -0.665200 -0.696700 -4.934600