Welcome to molli

About this tutorial

This file is meant to illustrate a few fundamental principles of the new molli package. The difference between old and new style molli is stark, therefore this introductory tutorial will be useful for both experienced people and newcomers.

Basic structure of molli package.

Subpackages

[1]:
# This is meant to be as iconic as `import numpy as np` :)
import molli as ml

Command line

molli features a number of standalone scripts for standard procedures, such as parsing a .CDXML file, or for compiling a collection.

[2]:
# This is a shell command
!molli --HELP
usage: molli [-C <file.yml>] [-L <file.log>] [-v] [-H] [-V]
             {list,align,combine,compile,gbca,grid,info,ls,parse,recollect,run,show,stats,test}

MOLLI package is an API that intends to create a concise and easy-to-use
syntax that encompasses the needs of cheminformatics (especially so, but not
limited to the workflows developed and used in the Denmark laboratory.

positional arguments:
  {list,align,combine,compile,gbca,grid,info,ls,parse,recollect,run,show,stats,test}
                        This is main command that invokes a specific
                        standalone routine in MOLLI. To get full explanation
                        of available commands, run `molli list`

options:
  -C <file.yml>, --CONFIG <file.yml>
                        Sets the file from which molli configuration will be
                        read from
  -L <file.log>, --LOG <file.log>
                        Sets the file that will contain the output of molli
                        routines.
  -v, --verbose         Sets the level of verbosity for molli output.
  -H, --HELP            show help message and exit
  -V, --VERSION         show program's version number and exit
[3]:
# This is a shell command
!molli list
molli combine
molli compile
molli gbca
molli grid
molli info
molli ls
molli parse
molli recollect
molli show
molli stats
molli test

Basic objects

Molli features classes that are meant to construct and represent arbitrary chemical entities. They can be constructed completely programmatically or by importing the data from a saved file.

Molecule

Molecules can be instantiated in a few key ways, here is an example of two ways to load a mol2 file:

# This function imports a mol2 file from a string
mol = ml.Molecule.loads_mol2(mol2_string)

# or, similarly, from a file stream
mol = ml.Molecule.load_mol2(file_io)

# or file path
mol = ml.Molecule.load_mol2(file_path)

Here is an example of this in action:

[4]:
#Example file path available within molli
fpath = ml.files.benzene_mol2
print("Path to a test file", fpath)

#Loads a molecule object from the file path
mol = ml.Molecule.load_mol2(fpath)
print(f'This is the Molecule: {mol}')
print("Here is the molecule as an XYZ File")
print(mol.dumps_xyz())
Path to a test file /home/blakeo2/new_molli/molli_dev/molli/molli/files/benzene.mol2
This is the Molecule: Molecule(name='benzene', formula='C6 H6')
Here is the molecule as an XYZ File
12
benzene
C        -2.424200     1.134800    -0.000000
C        -3.698300     0.567200    -0.000000
C        -3.843800    -0.820000    -0.000000
C        -2.715200    -1.639600    -0.000000
C        -1.441100    -1.072000    -0.000000
C        -1.295600     0.315200    -0.000000
H        -2.310800     2.215600    -0.000000
H        -4.577600     1.205800    -0.000000
H        -4.836500    -1.262200    -0.000000
H        -2.828600    -2.720400    -0.000000
H        -0.561800    -1.710600     0.000000
H        -0.302900     0.757400     0.000000

molli is natively built to read in objects from three distinct formats:

SYBYL_MOL2

XYZ (this will not automatically perceive bonds/connectivity!)

CDXML (this will not automatically perceive hydrogens!)

OpenBabel is an essential tool in cheminformatics, uniting many formats under one unified molecular structure, OBMol, that can easily be converted between various file formats. We have designed an interface between molli and Openbabel that allows imports from almost any known chemical format into molli.

Openbabel is not a necessary dependency however, and would need to be independently installed to leverage this functionality (https://github.com/openbabel/openbabel)

An example with the mol format is shown below

[5]:
file_path = ml.files.dendrobine_molv3

#This loads the MOLFILE into Molli using openbabel
mol = ml.load(file_path, fmt='mol', parser='openbabel', otype="molecule", name='dendrobine')
mol
[5]:
Molecule(name='dendrobine', formula='C16 H25 N1 O2')

ConformerEnsemble

This is a fundamental class of molli that can be thought of as a collection of varying coordinates with a baseline set of atoms and bonds. These can be loaded in very similar fashion to Molecule objects:

# This function imports a mol2 file from a string
ens = ml.ConformerEnsemble.loads_mol2(mol2_string)

# or, similarly, from a file stream
ens = ml.ConformerEnsemble.load_mol2(file_io)

# or file path
ens = ml.ConformerEnsemble.load_mol2(file_path)

Here is an example of this in action

[6]:
file_path = ml.files.pentane_confs_mol2

ens = ml.load(file_path, fmt='mol2', otype='ensemble')
print(f'Here is the ensemble: {ens}')
print('Here are the XYZ coordinates of the full ensemble')
print(ens.dumps_xyz())
Here is the ensemble: ConformerEnsemble(name='pentane', formula='C5 H12', n_conformers=7)
Here are the XYZ coordinates of the full ensemble
17
pentane
C        -2.804500     3.996400    -1.412800
C        -2.748400     3.317400    -0.053600
H        -3.684000     4.644600    -1.476700
H        -2.867800     3.257600    -2.218100
H        -1.915400     4.612600    -1.580600
C        -1.528800     2.404000     0.066300
H        -3.665500     2.735900     0.095900
H        -2.718500     4.083500     0.729900
C        -0.228600     3.184600    -0.124600
H        -1.592000     1.606100    -0.683800
H        -1.526500     1.921300     1.051200
C        -0.089200     4.294900     0.904600
H        -0.200100     3.620600    -1.130000
H         0.628700     2.506600    -0.040400
H        -0.915000     5.009100     0.825500
H         0.847200     4.839900     0.749600
H        -0.081700     3.888900     1.921200
17
pentane
C        -2.729800     4.412900     1.000500
C        -2.748400     3.317400    -0.053600
H        -3.610700     5.054100     0.896400
H        -1.838700     5.040500     0.898400
H        -2.736800     3.987100     2.009000
C        -1.528800     2.404000     0.066300
H        -2.773500     3.776500    -1.048800
H        -3.666800     2.728900     0.055600
C        -0.228600     3.184600    -0.124600
H        -1.592000     1.606100    -0.683800
H        -1.526500     1.921300     1.051200
C        -0.162700     3.829400    -1.499900
H         0.629900     2.514400    -0.000300
H        -0.145900     3.963600     0.642100
H        -0.210400     3.073600    -2.290300
H         0.774800     4.382300    -1.614700
H        -0.990700     4.530900    -1.644300
17
pentane
C        -4.043200     2.548300     0.154600
C        -2.748400     3.317400    -0.053600
H        -4.902100     3.220700     0.064500
H        -4.072500     2.091200     1.148800
H        -4.152800     1.755200    -0.592000
C        -1.528800     2.404000     0.066300
H        -2.685200     4.122400     0.687800
H        -2.765200     3.788400    -1.043400
C        -0.228600     3.184600    -0.124600
H        -1.592000     1.606100    -0.683800
H        -1.526500     1.921300     1.051200
C         0.986600     2.276700    -0.021500
H        -0.153200     3.973500     0.632700
H        -0.229000     3.674800    -1.105100
H         1.034400     1.795200     0.960400
H         1.905000     2.855400    -0.161200
H         0.958100     1.494700    -0.787100
17
pentane
C        -4.043200     2.548300     0.154600
C        -2.748400     3.317400    -0.053600
H        -4.902100     3.220700     0.064500
H        -4.072500     2.091200     1.148800
H        -4.152800     1.755200    -0.592000
C        -1.528800     2.404000     0.066300
H        -2.685200     4.122400     0.687800
H        -2.765200     3.788400    -1.043400
C        -0.228600     3.184600    -0.124600
H        -1.592000     1.606100    -0.683800
H        -1.526500     1.921300     1.051200
C        -0.089200     4.294900     0.904600
H        -0.200100     3.620600    -1.130000
H         0.628700     2.506600    -0.040400
H        -0.915000     5.009100     0.825500
H         0.847200     4.839900     0.749600
H        -0.081700     3.888900     1.921200
17
pentane
C        -4.043200     2.548300     0.154600
C        -2.748400     3.317400    -0.053600
H        -4.902100     3.220700     0.064500
H        -4.072500     2.091200     1.148800
H        -4.152800     1.755200    -0.592000
C        -1.528800     2.404000     0.066300
H        -2.685200     4.122400     0.687800
H        -2.765200     3.788400    -1.043400
C        -0.228600     3.184600    -0.124600
H        -1.592000     1.606100    -0.683800
H        -1.526500     1.921300     1.051200
C        -0.162700     3.829400    -1.499900
H         0.629900     2.514400    -0.000300
H        -0.145900     3.963600     0.642100
H        -0.210400     3.073600    -2.290300
H         0.774800     4.382300    -1.614700
H        -0.990700     4.530900    -1.644300
17
pentane
C        -2.804500     3.996400    -1.412800
C        -2.748400     3.317400    -0.053600
H        -3.684000     4.644600    -1.476700
H        -2.867800     3.257600    -2.218100
H        -1.915400     4.612600    -1.580600
C        -1.528800     2.404000     0.066300
H        -3.665500     2.735900     0.095900
H        -2.718500     4.083500     0.729900
C        -0.228600     3.184600    -0.124600
H        -1.592000     1.606100    -0.683800
H        -1.526500     1.921300     1.051200
C         0.986600     2.276700    -0.021500
H        -0.153200     3.973500     0.632700
H        -0.229000     3.674800    -1.105100
H         1.034400     1.795200     0.960400
H         1.905000     2.855400    -0.161200
H         0.958100     1.494700    -0.787100
17
pentane
C        -2.729800     4.412900     1.000500
C        -2.748400     3.317400    -0.053600
H        -3.610700     5.054100     0.896400
H        -1.838700     5.040500     0.898400
H        -2.736800     3.987100     2.009000
C        -1.528800     2.404000     0.066300
H        -2.773500     3.776500    -1.048800
H        -3.666800     2.728900     0.055600
C        -0.228600     3.184600    -0.124600
H        -1.592000     1.606100    -0.683800
H        -1.526500     1.921300     1.051200
C         0.986600     2.276700    -0.021500
H        -0.153200     3.973500     0.632700
H        -0.229000     3.674800    -1.105100
H         1.034400     1.795200     0.960400
H         1.905000     2.855400    -0.161200
H         0.958100     1.494700    -0.787100

In addition, conformer ensembles can be instantiated from a list of mols

[7]:
file_path = ml.files.pentane_confs_mol2
mols = ml.load_all(file_path, otype='molecule')
print(mols)
ens = ml.ConformerEnsemble(mols)
print(ens.n_conformers)
print(f'Here is the ensemble: {ens}')

[Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12')]
7
Here is the ensemble: ConformerEnsemble(name='pentane', formula='C5 H12', n_conformers=7)

Substructure

This class pulls atoms and bonds from the parent structure, and allows for manipulation of a subset of atoms within the initial structure

[8]:
file_path = ml.files.dendrobine_mol2
dendrobine = ml.load(file_path, otype='molecule', name='dendrobine')

sub = ml.Substructure(dendrobine, (1, 3, 5))
print(sub.dumps_xyz())

sub.coords += 50.0
print(dendrobine.dumps_xyz())
3
<class 'molli.chem.structure.Substructure'>
C         0.057300    -0.022600     2.122700
C        -0.428400    -0.411300    -0.168700
C         2.456200     0.441700     1.836100

44
dendrobine
N         1.296000    -0.231900     1.267000
C        50.057300    49.977400    52.122700
C        -1.097400    -0.473800     1.205900
C        49.571600    49.588700    49.831300
C         0.868300     0.359800    -0.000400
C        52.456200    50.441700    51.836100
C        -2.432800     0.265900     0.983200
C        -2.655300     0.247200    -0.580100
C        -1.242800     0.562600    -1.016300
C         1.535300     0.579000    -1.417900
O         1.341000     2.036300    -1.650700
C         0.078200     2.229100    -2.216600
C        -0.597700     0.853900    -2.402000
O        -0.413500     3.318000    -2.455200
C         0.760200     0.167800    -2.700100
C         0.781200    -1.214100    -3.377300
H         1.242600     0.768400    -3.498200
H        -1.179400     1.558000    -0.537700
H         0.651600     1.395200     0.317400
H        -1.372200    -1.491000     1.513400
C        -0.282800    -1.883500    -0.595700
C         2.177500    -1.841200    -3.321500
C         0.348500    -1.100600    -4.851400
H         0.059000    -1.900900    -2.947800
H        -0.064900     1.026400     2.419800
H         0.116200    -0.640800     3.024400
H         2.715000    -0.004900     2.801900
H         2.292700     1.514700     1.987400
H         3.325500     0.316200     1.181800
H        -2.371600     1.296200     1.353400
H        -3.263900    -0.226100     1.498200
H        -3.381000     1.006900    -0.883600
H        -2.993400    -0.735800    -0.922200
H         2.608800     0.379800    -1.439400
H        -1.292900     0.877200    -3.242300
H        -0.307500    -2.585400     0.247500
H         0.692000    -2.089000    -1.031900
H        -1.094300    -2.190400    -1.261400
H         2.179500    -2.824100    -3.804900
H         2.517600    -1.982200    -2.291400
H         2.914300    -1.213500    -3.834300
H         0.352200    -2.085900    -5.330700
H         1.023200    -0.452000    -5.421000
H        -0.665200    -0.696700    -4.934600