ChemSpiPy - A Python wrapper for the ChemSpider API

I recently had the task of matching up a large amount of poorly organised molecular properties data with the corresponding structures, where the data was only identified by name. To make matters worse, the names were mostly an inconsistent mix of common names and trade names.

The best tool I’ve found for this kind of problem is ChemSpider - you can enter any type of chemical identifier into the simple search and it will attempt to resolve a structure for you. The success rate seems to be much higher than other services, and it has a web API, so the whole process can be automated and performed for thousands of structures at a time.

I was just about to write a Python interface to the API from scratch, when I came across ChemSpiPy by Cameron Neylon, a bare bones Python wrapper for the API. I made a few bug fixes and extended the functionality, so now you can easily search ChemSpider and retrieve properties and identifiers for chemical structures from your Python scripts.

You can download it from GitHub here. Cameron Neylon’s original version is also available.

Usage is pretty straightforward - simply download the chemspipy.py file and put it in the same directory as your Python script or somewhere on your Python search path. Then simply import it at the top of your Python script:

import chemspipy

To search ChemSpider with any kind of identifier, use the find function:

comp_list = chemspipy.find('Benzene')

This will return a list of Compound objects that each correspond to a ChemSpiderID. Alternatively, find_one will just return the best match.

c = chemspipy.find_one('Benzene')

If you already know the ChemSpiderID of a compound, you can simply use it to initialise a Compound object.

from chemspipy import Compound

c = Compound(236)

Compound objects have the following properties:

c.csid
c.imageurl
c.m.
c.smiles
c.inchi
c.inchikey
c.averagemass
c.molecularweight
c.monoisotopicmass
c.nominalmass
c.alogp
c.xlogp
c.commonname
c.image
c.mol

These are all retrieved lazily from ChemSpider only when requested to avoid unnecessary calls to the API. More details about what the API returns are available in the ChemSpider API Documentation. Before using the service, you’ll need to create an account on ChemSpider to get a security token.

Antony Williams on ChemSpider at #ACSDenver

Antony Williams presented “ChemSpider: Does Community Engagement Work to Build a Quality Online Resource for Chemists?” at the 242nd American Chemical Society National Meeting today.  This is just one of his five presentations on ChemSpider this week.  He noted at the end of the session that the presentation will be on his SlideShare page soon.

In the presentation, he noted that he has supposedly written two books according to Amazon.  One is Collaborative Computational Technologies for Biomedical Research, and the other is I Hate Sex, but there may be some author disambiguation in this case.  Maybe there is another Anthony J. Williams?

Throughout the presentation, he noted how much you can’t trust data from many supposedly reputable sources, but the staff at ChemSpider work to double and triple check their sources.  They work with about 400 outside suppliers of chemical data, and many data points do not match up.  Many data suppliers get their data from other sources, so often times errors can be repeated because of simple redundancy. 

Letting “the crowd” fix errors doesn’t really work because the interested crowd in chemistry is pretty small.

He mentions many other interesting projects such as the Spectral GameSpectraSchool, Open PHACTS, and the ChemSpider Synthetic Pages.  To date, they have only had a little over 130 people contribute to this freely available interactive database of synthetic chemistry, and they would like more people to be submit their data.

If you want more information on Antony Williams, you can also follow him on his twitter account or read his personal blog.

CIRpy - A Python interface for the Chemical Identifier Resolver (CIR)

In the past I have used the ChemSpider API (through ChemSpiPy) to resolve chemical names to structures. Unfortunately this doesn’t work that well for IUPAC names and I found myself wondering whether it was worth setting up a system that would try a number of different resolvers. More specifically, I wanted a system that would first try using OPSIN to match IUPAC names, and if that failed, try a ChemSpider lookup. Just as I was about to start doing this myself, I came across the Chemical Identifier Resolver (CIR) that does exactly that (and much more).

CIR is a web service created by by the CADD Group at the NCI that performs various chemical name to structure conversions. In short, it will (attempt to) resolve the structure of any chemical identifier that you throw at it. Under the hood it uses a combination of OPSIN, ChemSpider and CIR’s own database.

To simplify interacting with CIR through Python, I wrote a simple wrapper called CIRpy that handles constructing url requests and parsing XML responses. It’s available on github here.

Using it is a simple case of copying cirpy.py into a directory on your python path. Here’s an example using the resolve function:

import cirpy

smiles_string = cirpy.resolve('Aspirin','smiles')

There are full details of all available options in the readme.

Text
Photo
Quote
Link
Chat
Audio
Video