RDKit

This is the pydoc code for the rdkit module.

# molli.external.rdkit This module defines necessary functions (albeit not a complete set) to interface with RDKit.

molli.external.rdkit.to_rdmol(m: Molecule, via='sdf', remove_hs=True, set_atts=False, raise_kekulize=True) PropertyMol

This converts an existing Molecule Object to an RDKit Object. This will utilize Molli if the extension is xyz or mol2, otherwise it will default to attempting to parse with Openbabel. Currently supported formats are xyz, mol2, pdb, sdf, mol, and smi. This will also attempt to maintain the coordinates from the original structure when utilizing SMILES.

Parameters:
  • m (ml.Molecule) – Moleucle object to be converted

  • via (str, optional) – Extension format to use, by default ‘sdf’

  • remove_hs (bool, optional) – Removes hydrogens from the representation when creating RDKit mol, by default True

  • set_atts (bool, optional) – Prototype that attempts to set attributes that exist within a Molli Molecule, inlcuding atoms, bonds, and full molecule, by default False

  • raise_kekulize (bool) – Can be used to prevent instantiation if Molecule fails to kekulize

Returns:

RDKit Mol capable of being serialized

Return type:

PropertyMol

molli.external.rdkit.from_rdmol(rdmol: PropertyMol | Mol) Molecule

This function converts an RDKit Mol into a Molli Molecule utilizing Openbabel and MolToMolBlock within RDKit

Parameters:

rdmol (PropertyMol | Chem.Mol) – RDKit Molecule Object

Returns:

Molli Molecule Object

Return type:

ml.Molecule

molli.external.rdkit.rd_visualize(file_path: str, rd_list: list, subImgSize=(250, 250), legendFontSize=30, molsPerRow=5, legend_prop: str = '_Name', highlight_atom_prop: str = None, highlight_bond_prop: str = None) None

This visualizes a list of RDkit Molecule objects as an SVG. Atoms and Bonds can be highlighted if properties are assigned to the indiivdual Atoms and Bonds

Parameters:
  • file_path (str) – Path for SVG to be written to

  • rd_list (list) – List of RDKit Molecule Objects

  • subImgSize (tuple, optional) – Controls the size of Individual Molecule Images, by default (250, 250)

  • legendFontSize (int, optional) – Font Size of the Legend, by default 30

  • molsPerRow (int, optional) – Number of Molecules visualized per row, by default 5

  • legend_prop (str, optional) – Property of RDKit Molecule used to label molecules on the grid, by default “_Name”

  • highlight_atom_prop (str, optional) – Property of RDKit Atoms used to identify which atoms should be highlighted, by default None

  • highlight_bond_prop (str, optional) – Property of RDKit Bonds used to identify which bonds should be highlighted, by default None

Return type:

None

molli.external.rdkit.ml_rd_visualize(file_path: str, obj: Molecule | MoleculeLibrary | list, via: str = 'sdf', remove_hs=True, set_atts=True, raise_kekulize=True, subImgSize=(250, 250), legendFontSize=30, molsPerRow=5, legend_prop: str = '_Name', highlight_atom_prop: str = None, highlight_bond_prop: str = None) None

This visualizes a list, Molecule, or MoleculeLibrary via RDKit’s SVG viewer and Molli’s implementation. Atoms and Bonds can be highlighted if properties are assigned to the indiivdual Atoms and Bonds of the RDKit object.

Parameters:
  • file_path (str) – Path for SVG to be written to

  • obj (ml.Molecule | ml.MoleculeLibrary | list) – Input to convert to RDKit Object

  • via (str, optional) – Extension format to use, by default ‘sdf’

  • remove_hs (bool, optional) – Removes hydrogens from the representation when creating RDKit mol, by default True

  • set_atts (bool, optional) – Prototype that attempts to set attributes that exist within a Molli Molecule, inlcuding atoms, bonds, and full molecule, by default False

  • raise_kekulize (bool) – Can be used to prevent instantiation if Molecule fails to kekulize

  • subImgSize (tuple, optional) – Controls the size of Individual Molecule Images, by default (250, 250)

  • legendFontSize (int, optional) – Font Size of the Legend, by default 30

  • molsPerRow (int, optional) – Number of Molecules visualized per row, by default 5

  • legend_prop (str, optional) – Property of RDKit Molecule used to label molecules on the grid, by default “_Name”

  • highlight_atom_prop (str, optional) – Property of RDKit Atoms used to identify which atoms should be highlighted, by default None

  • highlight_bond_prop (str, optional) – Property of RDKit Bonds used to identify which bonds should be highlighted, by default None

molli.external.rdkit.canonicalize_rdmol(rdmol: PropertyMol, remove_hs=False) PropertyMol

Returns canonicalized RDKit mol generated from a canonicalized RDKit SMILES string. The current implementation will only add the “_Name” property

Parameters:
  • rdmol (PropertyMol) – RDKit Mol to canonicalize

  • remove_hs (bool, optional) – Removes Hydrogens upon canonicalization, by default False

Returns:

Returned RDKit Molecule

Return type:

PropertyMol

molli.external.rdkit.can_mol_order(rdmol: PropertyMol) tuple[PropertyMol, list[int], list[int]]
This a function tries to match the indexes of the canonicalized smiles string/molecular graph to a Molli Molecule object.

Any inputs to this function will AUTOMATICALLY ADD HYDROGENS (make them explicit) to the RDKit mol object.

Important Notes: - It will only have “_Kekulize_Issue” if the initial object had this property set (i.e. if it ran into an issue in the in initial instantiation) - The canonical rdkit mol object will have the “Canonical SMILES with hydrogens” available as the property: “_Canonical_SMILES_w_H” - There may be some properties missing as the PropertyCache is not being updated on the new canonicalized mol object, so consider using rdmol.UpdatePropertyCache() if you want to continue using the mol object

rdmolPropertyMol

RDKit Mol to change canonicalize

tuple[PropertyMol, list[int], list[int]]

`1. Canonical RDKit Mol Object with Hydrogens

`

`2. A List for reordering the Atom Indices after canonicalization

`

`3. A list for reordering the Bond Indices after canonicalization

`

molli.external.rdkit.reorder_molecule(ml_mol: Molecule, can_rdmol_w_h: PropertyMol, can_atom_reorder: list, can_bond_reorder: list) dict[Molecule, PropertyMol]

This is a function that utilizes the outputs of new_mol_order to reorder an existing molecule. Currently done in place on the original ml_mol object.

Parameters:
  • ml_mol (Molecule) – Molli Molecule Object to be reordered

  • can_rdmol_w_h (_type_) – Canonical RDKit Object to be matched with

  • can_atom_reorder (list) – List of integers associated with the atom reordering

  • can_bond_reorder (list) – List of integers associated with the bond reordering

Returns:

Dictionary linking Molli Molecule Object to RDKit Object

Return type:

dict[ml.Molecule, PropertyMol]

class molli.external.rdkit.atom_filter(rdmol)

Bases: PropertyMol

These functions are written as numpy arrays to isolate types of atoms very easily with the goal of the structure being:

isolated_atoms = (aromatic_type() & sp2_type() & carbon)

All the functions of this class figure create a boolean array the size of the “All Atoms” array, and then they define the intersection of this array with a second array (the case of the condition), and return the array: [(1 & 2 & …)]

It is recommended that rdkit molecules are canonicalized before utilizing this function.

sp2_type()

This takes a numpy array of Atom IDs and returns a boolean for where SP2 atoms exist. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

aromatic_type()

This takes a numpy array of Atom IDs and returns a boolean for where AROMATIC atoms exist. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

ring_type()

This takes a numpy array of Atom IDs and returns a boolean for where atoms IN A RING exist. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

carbon_type()

This takes a numpy array of Atom IDs and returns a boolean for where CARBON atoms exist. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

nitrogen_type()

This takes a numpy array of Atom IDs and returns a boolean for where NITROGEN atoms exist. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

oxygen_type()

This takes a numpy array of Atom IDs and returns a boolean for where OXYGEN atoms exist. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

atom_num_less_than(number: int)

This takes a numpy array of Atom IDs and returns a boolean for where the ATOM NUMBER is LESS than the input. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

atom_num_equals(number: int)

This takes a numpy array of Atom IDs and returns a boolean for where the ATOM NUMBER is EQUAL to the input. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

atom_num_greater_than(number: int)

This takes a numpy array of Atom IDs and returns a boolean for where the ATOM NUMBER is GREATER than the input. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

isotope_type_equals(number: int)

This takes a numpy array of Atom IDs and returns a boolean for where the ISOTOPE NUMBER is EQUAL to the input. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

charge_type_less_than(number: int)

This takes a numpy array of Atom IDs and returns a boolean for where the FORMAL CHARGE is LESS THAN the input. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

charge_type_equals(number: int)

This takes a numpy array of Atom IDs and returns a boolean for where the FORMAL CHARGE is EQUAL to the input. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

charge_type_greater_than(number: int)

This takes a numpy array of Atom IDs and returns a boolean for where the FORMAL CHARGE is GREATER THAN the input. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

hcount_less_than(number: int)

This takes a numpy array of Atom IDs and returns a boolean for where the HYDROGEN COUNT is LESS THAN the input. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

hcount_equals(number: int)

This takes a numpy array of Atom IDs and returns a boolean for where the HYDROGEN COUNT is EQUAL to the input. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

hcount_greater_than(number: int)

This takes a numpy array of Atom IDs and returns a boolean for where the HYDROGEN COUNT is GREATER THAN the input. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

in_ring()

This takes a numpy array of Atom IDs and returns a boolean for where atoms IN A RING exist. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

ring_size6()

This takes a numpy array of Atom IDs and returns a boolean for where atoms IN A 6-MEMBERED RING exist. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

ring_size5()

This takes a numpy array of Atom IDs and returns a boolean for where atoms IN A 5-MEMBERED RING exist. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

in_2_rings()

This takes a numpy array of Atom IDs and returns a boolean for where atoms IN 2 RINGS exist. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

in_1_ring()

This takes a numpy array of Atom IDs and returns a boolean for where atoms IN 1 RING exist. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

het_neighbors_3()

This takes a numpy array of Atom IDs and returns a boolean for where atoms have HETEROATOM NEIGHBORS = 3. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

het_neighbors_2()

This takes a numpy array of Atom IDs and returns a boolean for where atoms have HETEROATOM NEIGHBORS = 2. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

het_neighbors_1()

This takes a numpy array of Atom IDs and returns a boolean for where atoms have HETEROATOM NEIGHBORS = 1. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

het_neighbors_0()

This takes a numpy array of Atom IDs and returns a boolean for where atoms have HETEROATOM NEIGHBORS = 0. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

het_neighbors_greater_1()

This takes a numpy array of Atom IDs and returns a boolean for where atoms have HETEROATOM NEIGHBORS > 1. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

het_neighbors_greater_0()

This takes a numpy array of Atom IDs and returns a boolean for where atoms have HETEROATOM NEIGHBORS > 0. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

aliph_het_neighbors_2()

This takes a numpy array of Atom IDs and returns a boolean for where atoms are ALIPHATIC AND HAVE 2 HETEROATOM NEIGHBORS. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.

aliph_het_neighbors_1()

This takes a numpy array of Atom IDs and returns a boolean for where atoms are ALIPHATIC AND HAS 1 HETEROATOM NEIGHBORS. Inputs to this function are built for an ORDERED LIST OF ALL ATOM IDs from LEAST TO GREATEST.