{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Parallelized Calculations and Jobmapping\n",
    "\n",
    "`molli` has implemented a `jobmap` function that enables the parallelized application of external drivers to `MoleculeLibrary` or `ConformerLibrary` objects. `molli` currently has 4 unique drivers for various geometry optimization, conformer generation, and property calculation methods. The `molli` `jobmap` function can be used to run parallelized calculations. These can be run through either a local computer or a cluster of computers. For information about the structure and available methods, check the [molli.pipeline: External Drivers](../API/molli.pipeline/pipeline-notes.md) section!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 1: Running a Job on a Local Computer\n",
    "\n",
    "An example script is shown below\n",
    "\n",
    "```python\n",
    "\n",
    "##Necessary imports\n",
    "import molli as ml\n",
    "from molli.pipeline.crest import CrestDriver\n",
    "\n",
    "#This is the file the Molecules are retrieved from\n",
    "source = ml.MoleculeLibrary(\"example.mlib\", readonly=True)\n",
    "\n",
    "#This is the file the conformer ensembles calculated will be written to.\n",
    "destination = ml.ConformerLibrary(\"example_result.clib\", readonly=False)\n",
    "\n",
    "#This configures the driver, number of processes to use for each worker. Can also indicate how much memory to use.\n",
    "crest = CrestDriver(\"crest\", nprocs=16)\n",
    "\n",
    "ml.pipeline.jobmap(\n",
    "    crest.conformer_search,\n",
    "    source=source, #Source of molecules\n",
    "    destination=destination, #Where conformers will be written\n",
    "    cache_dir=\"./conf_cache\", #Where final outputs will be written, successful or not!\n",
    "    scratch_dir=\"./scratch_dir\", #Scratch Directory where calculations will be run\n",
    "    n_workers=4, #Number of workers to use. In this case, 4 workers, each with 16 processors as defined in the driver.\n",
    "    kwargs={\n",
    "        \"method\": \"gfnff\", #GFNFF method to be used\n",
    "        \"temp\": 298.15, #Temperature to assume\n",
    "        \"chk_topo\": True, #Will check topology\n",
    "    }, #These are arguments used in the conformer_search function and can be specified directly\n",
    "    progress=True, #Will print out progress\n",
    "    verbose = True, #Will print out extra information\n",
    ")\n",
    "```\n",
    "\n",
    "This will create a Conformer Library with the path `example.clib`, and all inputs/outputs to `./conf_cache`. In the cache directory, there is an `input` folder which contains the formatted inputs used to submit calculations, as well as the `output` folder, which contains an encoded output (i.e. written in bytes)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 2: Running a Job on a Cluster\n",
    "\n",
    "In the likely event the user wants to use a computational cluster, a separate function was created for submission of jobs through the scheduler called `jobmap_sge`. This function was designed for use with clusters configured with the Oracle Grid Engine (also known as Sun Grid Engine) for batch submissions of jobs. This has the same functionality as `jobmap`, with the only deviation being that the collection of all `JobInput` instances is passed to a process that runs a `qsub` command instead of a local executor, and that `n_workers` no longer needs to be specified.\n",
    "\n",
    "\n",
    "\n",
    "```python\n",
    "#Necessary imports\n",
    "import molli as ml\n",
    "from molli.pipeline.crest import CrestDriver\n",
    "\n",
    "#This is the file the Molecules are retrieved from\n",
    "source = ml.MoleculeLibrary(\"example.mlib\", readonly=True)\n",
    "\n",
    "#This is the file the conformer ensembles calculated will be written to.\n",
    "destination = ml.ConformerLibrary(\"example_result.clib\", readonly=False)\n",
    "\n",
    "#This configures the driver, number of processes to use for each worker. Can also indicate how much memory to use.\n",
    "crest = CrestDriver(\"crest\", nprocs=16)\n",
    "\n",
    "ml.pipeline.jobmap_sge(\n",
    "    crest.conformer_search,\n",
    "    source,\n",
    "    destination,\n",
    "    cache_dir=\"./conf_cache\", #Where final outputs will be written, successful or not!\n",
    "    scratch_dir=\"./scratch_dir\", #Scratch Directory where calculations will be run\n",
    "    kwargs={\n",
    "        \"method\": \"gfnff\", #GFNFF method to be used\n",
    "        \"temp\": 298.15, #Temperature to assume\n",
    "        \"chk_topo\": True, #Will check topology\n",
    "    }, #These are arguments used in the conformer_search function and can be specified directly\n",
    "    progress=True, #Will print out progress\n",
    "    verbose = True, #Will print out extra information\n",
    "    qsub_header=\"#$ -pe orte 16\\n\", #This specifies the parallel environment and number of slots\n",
    ")\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 3: Loading Encoded Output Files\n",
    "\n",
    "In the event that there is additional information desired from a file or a library gets written incorrectly, the encoded output cache can be read from and certain methods can be used. An example of this is shown below:\n",
    "\n",
    "\n",
    "```python\n",
    "#Necessary imports\n",
    "import molli as ml\n",
    "from glob import glob\n",
    "from pathlib import Path\n",
    "from tqdm import tqdm\n",
    "\n",
    "#This is the file the Molecules are retrieved from\n",
    "source = ml.MoleculeLibrary(\"example.mlib\", readonly=True)\n",
    "\n",
    "#This is the file the conformer ensembles calculated will be written to.\n",
    "destination = ml.ConformerLibrary(\"example_result.clib\", readonly=False)\n",
    "\n",
    "#This reads and writes to the respective files\n",
    "with source.reading(), destination.writing():\n",
    "    for file in tqdm(glob('./conf_cache/output/*.out')):\n",
    "        res = ml.pipeline.JobOutput.load(file) # Loads the Output file from the cache directory\n",
    "        name = Path(file).stem #Gives name of file\n",
    "        m = source[name] #Retrieves matching name from the source library\n",
    "\n",
    "        #This retrieves the conformer geometry\n",
    "        all_geoms = ml.CartesianGeometry.loads_all_xyz(\n",
    "            res.files[\"crest_conformers.xyz\"].decode()\n",
    "        )\n",
    "        \n",
    "        # This creates a conformer ensemble\n",
    "        result = ml.ConformerEnsemble(m, n_conformers=len(all_geoms))\n",
    "\n",
    "        # This updates the coordinates of all the conformers\n",
    "        for blank_conf, conf_geom in zip(result, all_geoms):\n",
    "            blank_conf.coords = conf_geom.coords\n",
    "\n",
    "        destination[name] = result\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "dev-blake",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}