{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Welcome to `molli`\n", "\n", "## About this tutorial\n", "This file is meant to illustrate a few fundamental principles of the new molli package. The difference between old and new style molli is stark, therefore this introductory tutorial will be useful for both experienced people and newcomers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic structure of `molli` package.\n", "\n", "### Subpackages" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# This is meant to be as iconic as `import numpy as np` :)\n", "import molli as ml" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Command line\n", "\n", "`molli` features a number of standalone scripts for standard procedures, such as parsing a .CDXML file, or for compiling a collection." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: molli [-C ] [-L ] [-v] [-H] [-V]\n", " {list,align,combine,compile,gbca,grid,info,ls,parse,recollect,run,show,stats,test}\n", "\n", "MOLLI package is an API that intends to create a concise and easy-to-use\n", "syntax that encompasses the needs of cheminformatics (especially so, but not\n", "limited to the workflows developed and used in the Denmark laboratory.\n", "\n", "positional arguments:\n", " {list,align,combine,compile,gbca,grid,info,ls,parse,recollect,run,show,stats,test}\n", " This is main command that invokes a specific\n", " standalone routine in MOLLI. To get full explanation\n", " of available commands, run `molli list`\n", "\n", "options:\n", " -C , --CONFIG \n", " Sets the file from which molli configuration will be\n", " read from\n", " -L , --LOG \n", " Sets the file that will contain the output of molli\n", " routines.\n", " -v, --verbose Sets the level of verbosity for molli output.\n", " -H, --HELP show help message and exit\n", " -V, --VERSION show program's version number and exit\n" ] } ], "source": [ "# This is a shell command\n", "!molli --HELP" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32mmolli combine\n", "\u001b[0m\u001b[32mmolli compile\n", "\u001b[0m\u001b[32mmolli gbca\n", "\u001b[0m\u001b[32mmolli grid\n", "\u001b[0m\u001b[32mmolli info\n", "\u001b[0m\u001b[32mmolli ls\n", "\u001b[0m\u001b[32mmolli parse\n", "\u001b[0m\u001b[32mmolli recollect\n", "\u001b[0m\u001b[32mmolli show\n", "\u001b[0m\u001b[32mmolli stats\n", "\u001b[0m\u001b[32mmolli test\n", "\u001b[0m" ] } ], "source": [ "# This is a shell command\n", "!molli list" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Basic objects\n", "\n", "Molli features classes that are meant to construct and represent arbitrary chemical entities. They can be constructed completely programmatically or by importing the data from a saved file. \n", "\n", "## Molecule\n", "\n", "Molecules can be instantiated in a few key ways, here is an example of two ways to load a mol2 file:\n", "\n", "```python\n", "# This function imports a mol2 file from a string\n", "mol = ml.Molecule.loads_mol2(mol2_string)\n", "\n", "# or, similarly, from a file stream\n", "mol = ml.Molecule.load_mol2(file_io)\n", "\n", "# or file path\n", "mol = ml.Molecule.load_mol2(file_path)\n", "```\n", "Here is an example of this in action:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Path to a test file /home/blakeo2/new_molli/molli_dev/molli/molli/files/benzene.mol2\n", "This is the Molecule: Molecule(name='benzene', formula='C6 H6')\n", "Here is the molecule as an XYZ File\n", "12\n", "benzene\n", "C -2.424200 1.134800 -0.000000\n", "C -3.698300 0.567200 -0.000000\n", "C -3.843800 -0.820000 -0.000000\n", "C -2.715200 -1.639600 -0.000000\n", "C -1.441100 -1.072000 -0.000000\n", "C -1.295600 0.315200 -0.000000\n", "H -2.310800 2.215600 -0.000000\n", "H -4.577600 1.205800 -0.000000\n", "H -4.836500 -1.262200 -0.000000\n", "H -2.828600 -2.720400 -0.000000\n", "H -0.561800 -1.710600 0.000000\n", "H -0.302900 0.757400 0.000000\n", "\n" ] } ], "source": [ "#Example file path available within molli\n", "fpath = ml.files.benzene_mol2\n", "print(\"Path to a test file\", fpath)\n", "\n", "#Loads a molecule object from the file path\n", "mol = ml.Molecule.load_mol2(fpath)\n", "print(f'This is the Molecule: {mol}')\n", "print(\"Here is the molecule as an XYZ File\")\n", "print(mol.dumps_xyz())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`molli` is natively built to read in objects from three distinct formats:\n", "\n", "`SYBYL_MOL2`\n", "\n", "`XYZ` (this will not automatically perceive bonds/connectivity!)\n", "\n", "`CDXML` (this will not automatically perceive hydrogens!)\n", "\n", "\n", "`OpenBabel` is an essential tool in cheminformatics, uniting many formats under one unified molecular structure, `OBMol`, that can easily be converted between various file formats. We have designed an interface between `molli` and `Openbabel` that allows imports from almost any known chemical format into `molli`. \n", "\n", "`Openbabel` is not a necessary dependency however, and would need to be independently installed to leverage this functionality (https://github.com/openbabel/openbabel)\n", "\n", "An example with the `mol` format is shown below" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Molecule(name='dendrobine', formula='C16 H25 N1 O2')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "file_path = ml.files.dendrobine_molv3\n", "\n", "#This loads the MOLFILE into Molli using openbabel\n", "mol = ml.load(file_path, fmt='mol', parser='openbabel', otype=\"molecule\", name='dendrobine')\n", "mol" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ConformerEnsemble\n", "\n", "This is a fundamental class of `molli` that can be thought of as a collection of varying coordinates with a baseline set of atoms and bonds. These can be loaded in very similar fashion to `Molecule` objects:\n", "\n", "```python\n", "# This function imports a mol2 file from a string\n", "ens = ml.ConformerEnsemble.loads_mol2(mol2_string)\n", "\n", "# or, similarly, from a file stream\n", "ens = ml.ConformerEnsemble.load_mol2(file_io)\n", "\n", "# or file path\n", "ens = ml.ConformerEnsemble.load_mol2(file_path)\n", "```\n", "\n", "Here is an example of this in action" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Here is the ensemble: ConformerEnsemble(name='pentane', formula='C5 H12', n_conformers=7)\n", "Here are the XYZ coordinates of the full ensemble\n", "17\n", "pentane\n", "C -2.804500 3.996400 -1.412800\n", "C -2.748400 3.317400 -0.053600\n", "H -3.684000 4.644600 -1.476700\n", "H -2.867800 3.257600 -2.218100\n", "H -1.915400 4.612600 -1.580600\n", "C -1.528800 2.404000 0.066300\n", "H -3.665500 2.735900 0.095900\n", "H -2.718500 4.083500 0.729900\n", "C -0.228600 3.184600 -0.124600\n", "H -1.592000 1.606100 -0.683800\n", "H -1.526500 1.921300 1.051200\n", "C -0.089200 4.294900 0.904600\n", "H -0.200100 3.620600 -1.130000\n", "H 0.628700 2.506600 -0.040400\n", "H -0.915000 5.009100 0.825500\n", "H 0.847200 4.839900 0.749600\n", "H -0.081700 3.888900 1.921200\n", "17\n", "pentane\n", "C -2.729800 4.412900 1.000500\n", "C -2.748400 3.317400 -0.053600\n", "H -3.610700 5.054100 0.896400\n", "H -1.838700 5.040500 0.898400\n", "H -2.736800 3.987100 2.009000\n", "C -1.528800 2.404000 0.066300\n", "H -2.773500 3.776500 -1.048800\n", "H -3.666800 2.728900 0.055600\n", "C -0.228600 3.184600 -0.124600\n", "H -1.592000 1.606100 -0.683800\n", "H -1.526500 1.921300 1.051200\n", "C -0.162700 3.829400 -1.499900\n", "H 0.629900 2.514400 -0.000300\n", "H -0.145900 3.963600 0.642100\n", "H -0.210400 3.073600 -2.290300\n", "H 0.774800 4.382300 -1.614700\n", "H -0.990700 4.530900 -1.644300\n", "17\n", "pentane\n", "C -4.043200 2.548300 0.154600\n", "C -2.748400 3.317400 -0.053600\n", "H -4.902100 3.220700 0.064500\n", "H -4.072500 2.091200 1.148800\n", "H -4.152800 1.755200 -0.592000\n", "C -1.528800 2.404000 0.066300\n", "H -2.685200 4.122400 0.687800\n", "H -2.765200 3.788400 -1.043400\n", "C -0.228600 3.184600 -0.124600\n", "H -1.592000 1.606100 -0.683800\n", "H -1.526500 1.921300 1.051200\n", "C 0.986600 2.276700 -0.021500\n", "H -0.153200 3.973500 0.632700\n", "H -0.229000 3.674800 -1.105100\n", "H 1.034400 1.795200 0.960400\n", "H 1.905000 2.855400 -0.161200\n", "H 0.958100 1.494700 -0.787100\n", "17\n", "pentane\n", "C -4.043200 2.548300 0.154600\n", "C -2.748400 3.317400 -0.053600\n", "H -4.902100 3.220700 0.064500\n", "H -4.072500 2.091200 1.148800\n", "H -4.152800 1.755200 -0.592000\n", "C -1.528800 2.404000 0.066300\n", "H -2.685200 4.122400 0.687800\n", "H -2.765200 3.788400 -1.043400\n", "C -0.228600 3.184600 -0.124600\n", "H -1.592000 1.606100 -0.683800\n", "H -1.526500 1.921300 1.051200\n", "C -0.089200 4.294900 0.904600\n", "H -0.200100 3.620600 -1.130000\n", "H 0.628700 2.506600 -0.040400\n", "H -0.915000 5.009100 0.825500\n", "H 0.847200 4.839900 0.749600\n", "H -0.081700 3.888900 1.921200\n", "17\n", "pentane\n", "C -4.043200 2.548300 0.154600\n", "C -2.748400 3.317400 -0.053600\n", "H -4.902100 3.220700 0.064500\n", "H -4.072500 2.091200 1.148800\n", "H -4.152800 1.755200 -0.592000\n", "C -1.528800 2.404000 0.066300\n", "H -2.685200 4.122400 0.687800\n", "H -2.765200 3.788400 -1.043400\n", "C -0.228600 3.184600 -0.124600\n", "H -1.592000 1.606100 -0.683800\n", "H -1.526500 1.921300 1.051200\n", "C -0.162700 3.829400 -1.499900\n", "H 0.629900 2.514400 -0.000300\n", "H -0.145900 3.963600 0.642100\n", "H -0.210400 3.073600 -2.290300\n", "H 0.774800 4.382300 -1.614700\n", "H -0.990700 4.530900 -1.644300\n", "17\n", "pentane\n", "C -2.804500 3.996400 -1.412800\n", "C -2.748400 3.317400 -0.053600\n", "H -3.684000 4.644600 -1.476700\n", "H -2.867800 3.257600 -2.218100\n", "H -1.915400 4.612600 -1.580600\n", "C -1.528800 2.404000 0.066300\n", "H -3.665500 2.735900 0.095900\n", "H -2.718500 4.083500 0.729900\n", "C -0.228600 3.184600 -0.124600\n", "H -1.592000 1.606100 -0.683800\n", "H -1.526500 1.921300 1.051200\n", "C 0.986600 2.276700 -0.021500\n", "H -0.153200 3.973500 0.632700\n", "H -0.229000 3.674800 -1.105100\n", "H 1.034400 1.795200 0.960400\n", "H 1.905000 2.855400 -0.161200\n", "H 0.958100 1.494700 -0.787100\n", "17\n", "pentane\n", "C -2.729800 4.412900 1.000500\n", "C -2.748400 3.317400 -0.053600\n", "H -3.610700 5.054100 0.896400\n", "H -1.838700 5.040500 0.898400\n", "H -2.736800 3.987100 2.009000\n", "C -1.528800 2.404000 0.066300\n", "H -2.773500 3.776500 -1.048800\n", "H -3.666800 2.728900 0.055600\n", "C -0.228600 3.184600 -0.124600\n", "H -1.592000 1.606100 -0.683800\n", "H -1.526500 1.921300 1.051200\n", "C 0.986600 2.276700 -0.021500\n", "H -0.153200 3.973500 0.632700\n", "H -0.229000 3.674800 -1.105100\n", "H 1.034400 1.795200 0.960400\n", "H 1.905000 2.855400 -0.161200\n", "H 0.958100 1.494700 -0.787100\n", "\n" ] } ], "source": [ "file_path = ml.files.pentane_confs_mol2\n", "\n", "ens = ml.load(file_path, fmt='mol2', otype='ensemble')\n", "print(f'Here is the ensemble: {ens}')\n", "print('Here are the XYZ coordinates of the full ensemble')\n", "print(ens.dumps_xyz())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition, conformer ensembles can be instantiated from a list of mols" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12'), Molecule(name='pentane', formula='C5 H12')]\n", "7\n", "Here is the ensemble: ConformerEnsemble(name='pentane', formula='C5 H12', n_conformers=7)\n" ] } ], "source": [ "file_path = ml.files.pentane_confs_mol2\n", "mols = ml.load_all(file_path, otype='molecule')\n", "print(mols)\n", "ens = ml.ConformerEnsemble(mols)\n", "print(ens.n_conformers)\n", "print(f'Here is the ensemble: {ens}')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Substructure\n", "\n", "This class pulls atoms and bonds from the parent structure, and allows for manipulation of a subset of atoms within the initial structure" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3\n", "\n", "C 0.057300 -0.022600 2.122700\n", "C -0.428400 -0.411300 -0.168700\n", "C 2.456200 0.441700 1.836100\n", "\n", "44\n", "dendrobine\n", "N 1.296000 -0.231900 1.267000\n", "C 50.057300 49.977400 52.122700\n", "C -1.097400 -0.473800 1.205900\n", "C 49.571600 49.588700 49.831300\n", "C 0.868300 0.359800 -0.000400\n", "C 52.456200 50.441700 51.836100\n", "C -2.432800 0.265900 0.983200\n", "C -2.655300 0.247200 -0.580100\n", "C -1.242800 0.562600 -1.016300\n", "C 1.535300 0.579000 -1.417900\n", "O 1.341000 2.036300 -1.650700\n", "C 0.078200 2.229100 -2.216600\n", "C -0.597700 0.853900 -2.402000\n", "O -0.413500 3.318000 -2.455200\n", "C 0.760200 0.167800 -2.700100\n", "C 0.781200 -1.214100 -3.377300\n", "H 1.242600 0.768400 -3.498200\n", "H -1.179400 1.558000 -0.537700\n", "H 0.651600 1.395200 0.317400\n", "H -1.372200 -1.491000 1.513400\n", "C -0.282800 -1.883500 -0.595700\n", "C 2.177500 -1.841200 -3.321500\n", "C 0.348500 -1.100600 -4.851400\n", "H 0.059000 -1.900900 -2.947800\n", "H -0.064900 1.026400 2.419800\n", "H 0.116200 -0.640800 3.024400\n", "H 2.715000 -0.004900 2.801900\n", "H 2.292700 1.514700 1.987400\n", "H 3.325500 0.316200 1.181800\n", "H -2.371600 1.296200 1.353400\n", "H -3.263900 -0.226100 1.498200\n", "H -3.381000 1.006900 -0.883600\n", "H -2.993400 -0.735800 -0.922200\n", "H 2.608800 0.379800 -1.439400\n", "H -1.292900 0.877200 -3.242300\n", "H -0.307500 -2.585400 0.247500\n", "H 0.692000 -2.089000 -1.031900\n", "H -1.094300 -2.190400 -1.261400\n", "H 2.179500 -2.824100 -3.804900\n", "H 2.517600 -1.982200 -2.291400\n", "H 2.914300 -1.213500 -3.834300\n", "H 0.352200 -2.085900 -5.330700\n", "H 1.023200 -0.452000 -5.421000\n", "H -0.665200 -0.696700 -4.934600\n", "\n" ] } ], "source": [ "file_path = ml.files.dendrobine_mol2\n", "dendrobine = ml.load(file_path, otype='molecule', name='dendrobine')\n", "\n", "sub = ml.Substructure(dendrobine, (1, 3, 5))\n", "print(sub.dumps_xyz())\n", "\n", "sub.coords += 50.0\n", "print(dendrobine.dumps_xyz())" ] } ], "metadata": { "kernelspec": { "display_name": "dev-blake", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.6" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }