Mechanism Matching

I wanted to briefly document how I implemented mechanism matching in OpenOChem.

When you draw a reaction mechanism or structure in ChemDoodle, using the API you have the option of exporting the structure(s) as MOLfile format or Chemdoodles JSON format.  I utilize both of these.  The Molfile only describes the molecular structures, while the JSON provides details of not only the structures but the curved arrow topology, bonds etc.  I take the Molfile structures and “Canonize” the atoms with RDKit.  Initially I used the CanonicalRankAtoms method but didn’t like the indexing, so I decided to use atom.GetProp(‘_CIPRank’) which provides the Cahn-Ingold-Prelog ranking of the atoms.  I may need to revisit this after more real world testing.  I convert all the Molfiles to canonical smiles using OpenBabel and then canonize them (I put them in alpha order).  I could have used RDkit for this as well, but Openbabel has proven to be very reliable.  From here I reference the JSON atom positions to the CIP ordering along with the curved arrows and bond start/end points.  Bond data is important because curved arrows can start or end at a bond.  All this leads to a hash table of canonized curved arrow data.  If the curved arrow data and smiles structures are the same then the mechanisms are the same.