I need to get full text articles as well as their MeSH terms from Pubmed central using Biopython's implementation of the E-utilities. So far, I have :
search_results = Entrez.read(Entrez.esearch(db="pmc",
term=search_query,
retmax=10,
usehistory="y"))
My search queryis such that I get only open access medline articles about some subject from the pubmed central database. When I download articles, I use efetch
like so :
handle = Entrez.efetch(db="pmc",
rettype="full",
retmode="xml",
retstart=start,
retmax=max,
webenv=search_results["WebEnv"],
query_key=search_results["QueryKey"])
So in my experience, the only way to get full text is with retmode="xml"
. rettype="full"
or rettype="medline"
doesn't seem to change much. My problem is I can't seem to get MeSH terms with these settings and I can't seem to get the full text with any other settings. Do you know if I'm missing something? Are MeSH terms not in a <MeshHeadingList>
tag? Do PMC's open access articles not have MeSH terms associated to them?
first search db=pubmed, get the mesh terms and extract the pmc identifier and then download the PMC article using another efetch.
I'm not certain I follow. I should use
esearch()
withdb="pubmed"
and then callefetch
twice, once withdb="pubmed"
and once withdb="pmc"
? Does that mean that the same articles hosted on different databases have different metadata? Why on earth would that be the case? Furthermore, how do I limit my search to PMC's "open access" section on pubmed?let's check the pubmed DTD:
how about the pmc dtd ?
(nothing)
ask ncbi
unless I'm wrong, PMC is a "free full-text archive"
Thank you for all of your help.
For my purposes, I need to use the open access Subset :
I'm trying the opposite -- getting the Pubmed article from the PMID in the PMC articles. It seems to have been working on small batches and I've found some MeSH terms but I have 45k articles to go through so we'll know for sure when that's done.