I'm trying to understand how to fetch recently published documents appearing in Pubmed. It's not as straightforward as you would think.
Mostly I'm confused by how PDAT works. In the XML, PubDate only has month and year (no day) for 90+% of the results, yet a search like
"2012/10/11"[PDAT] : "2012/10/31"[PDAT]
gives 39178 results while
"2012/10/19"[PDAT] : "2012/10/31"[PDAT]
gives 21679 results
There's no Day field in the Article Element of Pubmed XML for most articles. See here: qplot of PubDate/Day extracted from XML
If there is no day field in PubDate for the vast majority of journals, how come these searches differ so much? I would expect that a search for papers published between October 29th and October 31st would also show papers pubished in October and not including a publication day, which seems to be the case, so if the day field is only there for 10% of articles, the number of results for any date range search in October should only differ by a max of about 10%, right?
I could use created date, but created dates range from June to now for papers published this month, presumably because some publishers send in data well before the print comes out, so that's not really what I want.