Libraries and Serialization

These files are highly optimized binary storage formats developed for efficient constant time access of elements from large files. There are two key library formats: MoleculeLibrary and ConformerLibrary.

Both library formats take advantage of a “Lazy Loading” approach, where the initialization of an object is avoided until needed. This prevents excessively long loading times when operating with large files. It’s important to note that as of molli 1.2, we do not currently support an ordered library objects for speed purposes

Molecule Library

This is a collection made up of Molecule objects serialized in a binary format. To access these, we operate with stream-like operations when doing lazy loading. Note, the library will not show the number of items it contains until a with mlib.reading() statement is used.

Reading Example of MoleculeLibrary:

[9]:
# Imports molli
import molli as ml

#This is the path for an existing MoleculeLibrary
mlib_path = ml.files.cinchonidine_no_conf

#This instantiates the MoleculeLibrary to be read
mlib = ml.MoleculeLibrary(mlib_path, overwrite=False, readonly=True)

#This reads the MoleculeLibrary
with mlib.reading():

    print(mlib)

    #This iterates through the names in the library
    for name in mlib:
        #This retrieves the associated Molecule Object
        mol = mlib[name]

print(f'Here is an example name: {name}')
print(f"Here is it's associated molecule object: {mol}")
MoleculeLibrary(backend=UkvCollectionBackend('/home/blakeo2/new_molli/molli_dev/molli/molli/files/cinchonidine.mlib'), n_items=88)
Here is an example name: 3_3_c_cf0
Here is it's associated molecule object: Molecule(name='3_3_c_cf0', formula='C33 H37 N2 O1')

Writing Example of MoleculeLibrary

This is a slightly different, if one has an existing molecule object they would like to serialize, the syntax is as follows:

#This loads a molecule of interest
mol = ml.load(ml.files.dendrobine_mol2)

#This instantiates a MoleculeLibrary to be written to
new_mlib = ml.MoleculeLibrary('New Path', overwrite=False, readonly=False)

#This prepares the molecule library
with new_mlib.writing():

    #This serializes the molecule to the MoleculeLibrary using the existing name as a retrieval
    new_mlib[mol.name] = new_mlib

Reading and Writing Multiple Libraries

In the event that you have a library you would like to do an operation with then serialize, this can be written at the same time

#This is the path for an existing MoleculeLibrary
mlib_path = ml.files.cinchonidine_no_conf

#This instantiates the MoleculeLibrary to be read
mlib = ml.MoleculeLibrary(mlib_path, overwrite=False, readonly=True)

#This instantiates a MoleculeLibrary to be written to
new_mlib = ml.MoleculeLibrary('New Path', overwrite=False, readonly=False)

#This prepares the molecule libraries

with mlib.reading(), new_mlib.writing():

    #This iterates through the serialized mlib
    for name in mlib:
        #This instantiates the Molecule Object
        mol = mlib[name]

        #This translates the molecule 50 units in the x direction
        mol.translate([50,0,0])

        #This serializes this into a new molecule library
        new_mlib[name] = mol

This can be done with as many molecule libraries as desired, allowing unique serialization method implementations.

ConformerLibrary

These have the exact same functionality and syntax as the MoleculeLibraries, with the only difference being these are made up of ConformerEnsemble objects serialized in a binary format. Since the syntax is the same, this notebook will only give a reading example, but the same writing, and multi-reading and writing functionality exists

Reading Example of ConformerLibrary

[10]:
#This is the path for an existing ConformerLibrary
clib_path = ml.files.cinchonidine_rd_conf

#This instantiates the MoleculeLibrary to be read
clib = ml.ConformerLibrary(clib_path, overwrite=False, readonly=True)

#This reads the MoleculeLibrary
with clib.reading():

    print(clib)

    #This iterates through the names in the library
    for name in clib:
        #This retrieves the associated ConformerEnsemble Object
        ens = clib[name]

print(f'Here is an example name: {name}')
print(f"Here is it's associated ConformerEnsemble object: {ens}")
ConformerLibrary(backend=UkvCollectionBackend('/home/blakeo2/new_molli/molli_dev/molli/molli/files/cinchonidine_rdconfs.clib'), n_items=88)
Here is an example name: 3_6_c
Here is it's associated ConformerEnsemble object: ConformerEnsemble(name='3_6_c', formula='C33 H34 F3 N2 O1', n_conformers=200)