Parallelized Calculations and Jobmapping¶

molli has implemented a jobmap function that enables the parallelized application of external drivers to MoleculeLibrary or ConformerLibrary objects. molli currently has 4 unique drivers for various geometry optimization, conformer generation, and property calculation methods. The molli jobmap function can be used to run parallelized calculations. These can be run through either a local computer or a cluster of computers. For information about the structure and available methods, check the molli.pipeline: External Drivers section!

Example 1: Running a Job on a Local Computer¶

An example script is shown below

##Necessary imports
import molli as ml
from molli.pipeline.crest import CrestDriver

#This is the file the Molecules are retrieved from
source = ml.MoleculeLibrary("example.mlib", readonly=True)

#This is the file the conformer ensembles calculated will be written to.
destination = ml.ConformerLibrary("example_result.clib", readonly=False)

#This configures the driver, number of processes to use for each worker. Can also indicate how much memory to use.
crest = CrestDriver("crest", nprocs=16)

ml.pipeline.jobmap(
    crest.conformer_search,
    source=source, #Source of molecules
    destination=destination, #Where conformers will be written
    cache_dir="./conf_cache", #Where final outputs will be written, successful or not!
    scratch_dir="./scratch_dir", #Scratch Directory where calculations will be run
    n_workers=4, #Number of workers to use. In this case, 4 workers, each with 16 processors as defined in the driver.
    kwargs={
        "method": "gfnff", #GFNFF method to be used
        "temp": 298.15, #Temperature to assume
        "chk_topo": True, #Will check topology
    }, #These are arguments used in the conformer_search function and can be specified directly
    progress=True, #Will print out progress
    verbose = True, #Will print out extra information
)

This will create a Conformer Library with the path example.clib, and all inputs/outputs to ./conf_cache. In the cache directory, there is an input folder which contains the formatted inputs used to submit calculations, as well as the output folder, which contains an encoded output (i.e. written in bytes).

Example 2: Running a Job on a Cluster¶

In the likely event the user wants to use a computational cluster, a separate function was created for submission of jobs through the scheduler called jobmap_sge. This function was designed for use with clusters configured with the Oracle Grid Engine (also known as Sun Grid Engine) for batch submissions of jobs. This has the same functionality as jobmap, with the only deviation being that the collection of all JobInput instances is passed to a process that runs a qsub command instead of a local executor, and that n_workers no longer needs to be specified.

#Necessary imports
import molli as ml
from molli.pipeline.crest import CrestDriver

#This is the file the Molecules are retrieved from
source = ml.MoleculeLibrary("example.mlib", readonly=True)

#This is the file the conformer ensembles calculated will be written to.
destination = ml.ConformerLibrary("example_result.clib", readonly=False)

#This configures the driver, number of processes to use for each worker. Can also indicate how much memory to use.
crest = CrestDriver("crest", nprocs=16)

ml.pipeline.jobmap_sge(
    crest.conformer_search,
    source,
    destination,
    cache_dir="./conf_cache", #Where final outputs will be written, successful or not!
    scratch_dir="./scratch_dir", #Scratch Directory where calculations will be run
    kwargs={
        "method": "gfnff", #GFNFF method to be used
        "temp": 298.15, #Temperature to assume
        "chk_topo": True, #Will check topology
    }, #These are arguments used in the conformer_search function and can be specified directly
    progress=True, #Will print out progress
    verbose = True, #Will print out extra information
    qsub_header="#$ -pe orte 16\n", #This specifies the parallel environment and number of slots
)

Example 3: Loading Encoded Output Files¶

In the event that there is additional information desired from a file or a library gets written incorrectly, the encoded output cache can be read from and certain methods can be used. An example of this is shown below:

#Necessary imports
import molli as ml
from glob import glob
from pathlib import Path
from tqdm import tqdm

#This is the file the Molecules are retrieved from
source = ml.MoleculeLibrary("example.mlib", readonly=True)

#This is the file the conformer ensembles calculated will be written to.
destination = ml.ConformerLibrary("example_result.clib", readonly=False)

#This reads and writes to the respective files
with source.reading(), destination.writing():
    for file in tqdm(glob('./conf_cache/output/*.out')):
        res = ml.pipeline.JobOutput.load(file) # Loads the Output file from the cache directory
        name = Path(file).stem #Gives name of file
        m = source[name] #Retrieves matching name from the source library

        #This retrieves the conformer geometry
        all_geoms = ml.CartesianGeometry.loads_all_xyz(
            res.files["crest_conformers.xyz"].decode()
        )

        # This creates a conformer ensemble
        result = ml.ConformerEnsemble(m, n_conformers=len(all_geoms))

        # This updates the coordinates of all the conformers
        for blank_conf, conf_geom in zip(result, all_geoms):
            blank_conf.coords = conf_geom.coords

        destination[name] = result