`molli` Study case: Cladosporin GIAO-DFT NMR prediction¶

Objective¶

The objective of this task is to use the molli workflow to transition from .CDXML molecular drawings to full-fledged 3D structures with subsequent CREST//GFN2-XTB conformer generation, ORCA equilibrium structure minimizations and GIAO-DFT NMR predictions. This workflow is particularly useful in validation of relative stereochemistry where experimental NMR is not conclusive alone (see e. g. J. Org. Chem. 2020, 85, 5, 3297)

Prerequisites¶

molli installation
pandas installation (it is NOT a dependency of molli and must be installed separately)
OpenBabel installation (it is NOT a dependency of molli and must be installed separately) Note: if you experience problems with openbabel refusing to optimize the structures, do not forget to set the BABEL_DATADIR environment variable to point to a valid location with openbabel data files. With conda installations, it seems to be consistently pointing to $CONDA_PREFIX/share/openbabel
openpyxl installation (it is NOT a dependency of molli and must be installed separately)
rmsd installation (it is NOT a dependency of molli and must be installed separately)
cladosporin.csv file that has the experimental NMR chemical shifts
cladosporin.cdxml file that contains the structural drawings

Methods¶

Conformer generation¶

Conformers are generated with iMTD-GC workflow as implemented in CREST.

[6]:

import molli as ml
import pandas as pd
import numpy as np

# This configures on-screen preview of molecules
ml.visual.configure(bgcolor="black", theme="dark")

Step 1. `CDXML` Parsing¶

In this step, all of the molecules are read from the CDXML file and converted into their 3D versions. This can be done from both the command line interface, as well as programmatically. Here, the command line interface will be shown.

[7]:

!molli parse cladosporin.cdxml -o cladosporin_g0.mlib --hadd

Parsing cladosporin.cdxml: 100%|█████████████████| 4/4 [00:00<00:00, 148.00it/s]

We will now use IPython line magic mlib_view to visualize the entries in the resulting library.

[8]:

%mlib_view cladosporin_g0.mlib cladosporin_1a

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

Step 2. OpenBabel preliminary optimization¶

In this step we will perform a preliminary optimization of these molecules with OpenBabel’s implementation of MMFF94 force field. This typically removes parsing artifacts resulting in unreasonable geometries which may break GFN2-XTB optimizations from time to time.

Important: do inspect the resulting structures as they do not always converge to the necessary results.

[9]:

from molli.external.openbabel import obabel_optimize

source_lib = ml.MoleculeLibrary("cladosporin_g0.mlib")
obabel_preopt = ml.MoleculeLibrary("cladosporin_g1.mlib", overwrite=True, readonly=False)


# NOTE: in certain unreasonable starting geometries, such as linear C-O-H geometries, planar tetrahedral atoms and many more
#   force field gradients corresponding to certain internal degrees of freedom seem to diverge
#   this results in the inability of the corresponding forcefield to perform the optimization
#   the only solution we were able to come up with to ameliorate this situation is to displace EVERY atom by a random vector of a certain magnitude


# These guards may look a bit clumsy
#   but they become enable synchronization of parallel processes working on the same libraries
with source_lib.reading(), obabel_preopt.writing():
    for k in source_lib.keys():
        optimized = obabel_optimize(
            mol=source_lib[k],
            ff="mmff94",
            max_steps=1000,
            tol=1e-4,
            coord_displace=0.1, # see note above
            inplace=False
        )

        obabel_preopt[k] = optimized

optimized

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

[9]:

Molecule(name='cladosporin_1d', formula='C16 H20 O5')

Step 3. GFN2-XTB Conformer generation¶

In this step the conformers of molecules in question will be generated. For this we will be utilizing molli.pipeline module. While it is still in development, it allows to simplify parallel calculations with intermediate result caching.

Note The code for this is in a separate file because jupyter notebook does not play nice with the thread pool executor.

[10]:

!python workflow.py

Running conformer generation with: crest 32
debug: nprocs 32
====================================================================================================
Submitting jobs: 0it [00:00, ?it/s]
Waiting for jobs: 0it [00:00, ?it/s]
Finalizing the calculations: 0it [00:00, ?it/s]
====================================================================================================
Submitting jobs: 0it [00:00, ?it/s]
Waiting for jobs: 0it [00:00, ?it/s]
Finalizing the calculations: 0it [00:00, ?it/s]
====================================================================================================
Starting a molli.pipeline.jobmap_sge calculation:
calculation: ORCADriver.optimize_ens
None
     source:  ConformerLibrary(backend=UkvCollectionBackend('cladosporin_g3.clib'), n_items=4)
destination:  ConformerLibrary(backend=UkvCollectionBackend('cladosporin_g4.clib'), n_items=0)
scratch dir:  /home/blakeo2/.molli/scratch
Total number of jobs:        4
Exist in destination:        0
To be computed:              4
Submitting jobs: 0it [00:00, ?it/s]
Waiting for jobs: 0it [00:00, ?it/s]
Finalizing the calculations: 100%|████████████████| 4/4 [00:03<00:00,  1.04it/s]
====================================================================================================
Starting a molli.pipeline.jobmap_sge calculation:
calculation: ORCADriver.giao_nmr_ens
None
     source:  ConformerLibrary(backend=UkvCollectionBackend('cladosporin_g4.clib'), n_items=4)
destination:  ConformerLibrary(backend=UkvCollectionBackend('cladosporin_g4_nmr.clib'), n_items=0)
scratch dir:  /home/blakeo2/.molli/scratch
Total number of jobs:        4
Exist in destination:        0
To be computed:              4
Submitting jobs: 0it [00:00, ?it/s]
Waiting for jobs: 0it [00:00, ?it/s]
Finalizing the calculations: 100%|████████████████| 4/4 [00:00<00:00,  7.07it/s]

[11]:

# This is not a necessary step but it does make our lives easier
!molli align -i cladosporin_g3.clib -o cladosporin_g3a.clib -q align_qry.mol2

The output path is cladosporin_g3a.clib
matching structures:
100%|█████████████████████████████████████████████| 4/4 [00:00<00:00, 82.68it/s]
bringing to an optimal rotation:
100%|█████████████████████████████████████████████| 4/4 [00:00<00:00, 22.36it/s]
Conformers checked: 340, maximal rmsd: 0.165, minimal rmsd: 0.042
Median rmsd: 0.095

[12]:

!molli align -i cladosporin_g4.clib -o cladosporin_g4a.clib -q align_qry.mol2
%clib_view cladosporin_g4a.clib cladosporin_1c

The output path is cladosporin_g4a.clib
matching structures:
100%|█████████████████████████████████████████████| 4/4 [00:00<00:00, 76.89it/s]
bringing to an optimal rotation:
100%|█████████████████████████████████████████████| 4/4 [00:00<00:00, 21.98it/s]
Conformers checked: 335, maximal rmsd: 0.088, minimal rmsd: 0.046
Median rmsd: 0.07

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

Computing NMR¶

NMR chemical shifts will be computed with linear correction according to $ \delta_i = k \sigma_i + b $, where $k = -0.9127, b = 170.87$ .

[13]:

expt_nmr = pd.read_excel("cladosporin-expt.xlsx")

nuclei = [str(x) for x in expt_nmr["Unnamed: 0"]]

mol_data_map = {
    "cladosporin_1a": "exp_1a",
    "cladosporin_1b": "exp_1b",
    "cladosporin_1c": "exp_1c",
    "cladosporin_1d": "exp_1d",
}

# See htpps://doi.org/10.1021/jacs.0c12569 --> Supporting information
# for more details about the scaling factor derivations
SCALE_K = -0.9127
SCALE_B = 170.87

RT = 8.314 * 0.298


library = ml.ConformerLibrary("cladosporin_g4_nmr.clib")

with library.reading():
    for mol_name, exp_name in mol_data_map.items():
        ensemble = library[mol_name]
        for a in ensemble.atoms:
            if a.element == ml.Element.C and a.label is None:
                a.label = "15"

        print(f"Computing NMR for conformer ensemble {mol_name!r}")
        energies = np.array(
            [d["ORCA/SCF_Energy"] for d in ensemble.attrib["conformer_properties"]]
        )

        rel_energies = 2625.5*(energies - np.min(energies))

        boltz_f = np.exp(rel_energies / (-RT))
        boltz_w = boltz_f / np.sum(boltz_f)

        shieldings_w = np.array(
            [
                np.average(a.attrib["NMR_shielding"], weights=boltz_w)
                for a in ensemble.get_atoms(*nuclei)
            ]
        )
        nmr_w = np.round(shieldings_w * SCALE_K + SCALE_B, 1)

        err = nmr_w - expt_nmr[exp_name]

        result = pd.DataFrame(
            data={
                "nuclei": nuclei,
                "expt_nmr": expt_nmr[exp_name],
                "pred_nmr": nmr_w,
                "error": err,
            }
        )

        print(result)
        print("-" * 20)
        print(f"{ensemble!r}, {ensemble.n_conformers=}")
        print(f"Highest Boltz weight: {np.max(boltz_w):0.3%}")
        print(f"Conformers (> 1% wt): {np.count_nonzero(boltz_w > 0.01)}")
        print(f"MAE= {np.average(np.abs(err)):5.1f}")
        print(f"RMS= {np.sqrt(np.average(err**2)):5.1f}")
        print(f"MAX= {np.max(np.abs(err)):5.1f}")
        print("=" * 80)

Computing NMR for conformer ensemble 'cladosporin_1a'
   nuclei  expt_nmr  pred_nmr  error
0       1     169.9     168.9   -1.0
1       3      76.3      75.6   -0.7
2       4      33.6      35.0    1.4
3      4a     141.8     143.9    2.1
4       5     106.7     103.6   -3.1
5       6     163.1     161.8   -1.3
6       7     102.0      98.9   -3.1
7       8     164.3     164.3    0.0
8      8a     101.5     100.0   -1.5
9       9      39.3      39.1   -0.2
10     10      66.6      65.9   -0.7
11     11      30.9      31.6    0.7
12     12      18.1      20.1    2.0
13     13      30.9      32.1    1.2
14     14      68.0      67.7   -0.3
15     15      18.9      19.1    0.2
--------------------
ConformerEnsemble(name='cladosporin_1a', formula='C16 H20 O5'), ensemble.n_conformers=85
Highest Boltz weight: 24.329%
Conformers (> 1% wt): 16
MAE=   1.2
RMS=   1.5
MAX=   3.1
================================================================================
Computing NMR for conformer ensemble 'cladosporin_1b'
   nuclei  expt_nmr  pred_nmr  error
0       1     170.0     169.0   -1.0
1       3      76.6      76.6    0.0
2       4      32.6      33.7    1.1
3      4a     141.6     143.8    2.2
4       5     106.7     103.7   -3.0
5       6     163.0     161.8   -1.2
6       7     102.0      98.9   -3.1
7       8     164.4     164.3   -0.1
8      8a     101.6     100.0   -1.6
9       9      37.3      37.5    0.2
10     10      67.2      67.1   -0.1
11     11      29.8      30.4    0.6
12     12      18.1      20.1    2.0
13     13      31.5      32.4    0.9
14     14      67.4      67.5    0.1
15     15      19.7      19.6   -0.1
--------------------
ConformerEnsemble(name='cladosporin_1b', formula='C16 H20 O5'), ensemble.n_conformers=113
Highest Boltz weight: 15.161%
Conformers (> 1% wt): 20
MAE=   1.1
RMS=   1.5
MAX=   3.1
================================================================================
Computing NMR for conformer ensemble 'cladosporin_1c'
   nuclei  expt_nmr  pred_nmr  error
0       1     171.7     168.9   -2.8
1       3      78.0      76.4   -1.6
2       4      34.5      33.9   -0.6
3      4a     143.6     143.7    0.1
4       5     108.1     103.7   -4.4
5       6     165.8     161.8   -4.0
6       7     102.3      98.9   -3.4
7       8     166.4     164.3   -2.1
8      8a     101.8     100.0   -1.8
9       9      42.2      41.3   -0.9
10     10      75.2      73.4   -1.8
11     11      32.5      32.0   -0.5
12     12      22.6      25.3    2.7
13     13      33.8      33.9    0.1
14     14      75.5      73.8   -1.7
15     15      24.7      22.2   -2.5
--------------------
ConformerEnsemble(name='cladosporin_1c', formula='C16 H20 O5'), ensemble.n_conformers=58
Highest Boltz weight: 24.882%
Conformers (> 1% wt): 15
MAE=   1.9
RMS=   2.3
MAX=   4.4
================================================================================
Computing NMR for conformer ensemble 'cladosporin_1d'
   nuclei  expt_nmr  pred_nmr  error
0       1     171.7     168.9   -2.8
1       3      77.6      75.3   -2.3
2       4      34.6      35.1    0.5
3      4a     143.7     143.9    0.2
4       5     108.0     103.7   -4.3
5       6     165.8     161.8   -4.0
6       7     102.3      98.9   -3.4
7       8     166.4     164.2   -2.2
8      8a     101.7      99.9   -1.8
9       9      43.0      42.5   -0.5
10     10      74.8      72.2   -2.6
11     11      33.1      32.7   -0.4
12     12      22.5      25.4    2.9
13     13      34.5      34.1   -0.4
14     14      75.3      73.6   -1.7
15     15      24.8      22.1   -2.7
--------------------
ConformerEnsemble(name='cladosporin_1d', formula='C16 H20 O5'), ensemble.n_conformers=79
Highest Boltz weight: 49.811%
Conformers (> 1% wt): 8
MAE=   2.0
RMS=   2.4
MAX=   4.3
================================================================================

molli Study case: Cladosporin GIAO-DFT NMR prediction¶

Objective¶

Prerequisites¶

Methods¶

Conformer generation¶

Step 1. CDXML Parsing¶

Step 2. OpenBabel preliminary optimization¶

Step 3. GFN2-XTB Conformer generation¶

Computing NMR¶

`molli` Study case: Cladosporin GIAO-DFT NMR prediction¶

Step 1. `CDXML` Parsing¶