I wanted to briefly document how I implemented mechanism matching in OpenOChem.
When you draw a reaction mechanism or structure in ChemDoodle, using the API you have the option of exporting the structure(s) as MOLfile format or Chemdoodles JSON format. I utilize both of these. The Molfile only describes the molecular structures, while the JSON provides details of not only the structures but the curved arrow topology, bonds etc. I take the Molfile structures and “Canonize” the atoms with RDKit. Initially I used the CanonicalRankAtoms method but didn’t like the indexing, so I decided to use atom.GetProp(‘_CIPRank’) which provides the Cahn-Ingold-Prelog ranking of the atoms. I may need to revisit this after more real world testing. I convert all the Molfiles to canonical smiles using OpenBabel and then canonize them (I put them in alpha order). I could have used RDkit for this as well, but Openbabel has proven to be very reliable. From here I reference the JSON atom positions to the CIP ordering along with the curved arrows and bond start/end points. Bond data is important because curved arrows can start or end at a bond. All this leads to a hash table of canonized curved arrow data. If the curved arrow data and smiles structures are the same then the mechanisms are the same.