PubMed ID or DOI? And which is easier to extract metadata from given that identifier?
Neither is perfect by any means.
Let's take the example of one of my papers An integrated dataset for in silico drug discovery.
Assuming we have the 2 identifiers:
And we want the full text and metadata for the article.
Full text first, let's use the PMID, and visit the PubMed entry for the paper, here: http://pubmed.org/20375448.
Lo and behold, there is no link through to the article at all. We've hit a dead end.
So let's try the DOI: http://dx.doi.org/10.2390/biecoll-jib-2010-116.
Hurrah, at least there is a link to the full text, even if the DOI doesn't take us through to the actual article itself, but still... success!
OK, now metadata. The DOI gave us success with the article, so let's try that first. Search the CrossRef metadata here: http://api.labs.crossref.org/search?q=10.2390%2Fbiecoll-jib-2010-116.
Hmmm, nothing about my paper at all in that list (this is because the CrossRef database is curiously free of information about that paper, despite the DOI resolving to the paper).
So, back to the PMID, and etuils, here: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=20375448.
Yay, that looks like suitable metadata (even if it is in a completely different format to that we would get back from CrossRef, so any parser we write to consume CrossRef metadata will break). But why could we not retrieve this with the DOI?
I'm not for a second suggesting that the situation is this poor for every article (mostly it depends on the publisher and what they submit to the various databases), but the proportion of articles like this is significant enough for it to be problematic.
Publishers should be providing good enough metadata on the articles themselves for this ad hoc system of arbitrary third party identifiers to be completely unnecessary.
Check out Rod's blog, iPhylo, for some interesting posts too!