Taverna Workflow To Retrieve 1500 Papers For One Or Two Keywords As Plain Text?
3
6
Entering edit mode
13.5 years ago

What Taverna 2 workflow can I use to do a query against the Open Access subsection of PubMed and return a subset of up to 1500 papers, for further processing, including text mining? If no complete workflow is available, I am interested in workflows that do similar things, particularly if they are hosted on MyExperiment.org. I am happy if it involved new plugins. I'm also happy with solutions that include the use of XSLT and BeanShell scripts.

Update: the bounty goes to the most functional (open source) Taverna BeanShell script.

literature pubmed text • 8.5k views
ADD COMMENT
0
Entering edit mode

I do not know a Taverna workflow that does this already, but you can easily retrieve an XML-document with PMC-open identifiers via:

http://www.pubmedcentral.nih.gov/oai/oai.cgi?verb=ListIdentifiers&metadataPrefix=pmc&set=pmc-open

From that XML-document you can then get the individual records via further queries, such as: http://www.pubmedcentral.nih.gov/oai/oai.cgi?verb=GetRecord&identifier=oai:pubmedcentral.nih.gov:17827&metadataPrefix=pmc

More about PMC-OAI at: http://www.ncbi.nlm.nih.gov/pmc/about/oai.html

Hope this helps a bit, even though I cannot answer your original question.

ADD REPLY
0
Entering edit mode

Would you be happy to use UkPubmed instead of pubmed? http://ukpmc.ac.uk/ Ukpubmed indexes only open-access documents and compared to pubmed, it indexes PhD thesis published in the UK. not sure if I have time to prepare a taverna workflow for this, but if you look at the page it should not be difficult.

ADD REPLY
0
Entering edit mode

Absolutely, either is fine!

ADD REPLY
2
Entering edit mode
13.3 years ago
Stain ▴ 20

I assume you've checked http://www.myexperiment.org/search?query=pubmed&type=all already..?

ADD COMMENT
1
Entering edit mode
13.3 years ago

I am afraid it is very difficult to do that.

One option is to login into Pubmed Central, do a query, and then retrieve all the pdfs. I am sure you can automatize it with taverna, but it has been too long since I have used it and I am not able to program it.

Another option is to use the UKPubmed archive, making a query and then download all the pdfs by constructing a query like:

http://ukpmc.ac.uk/articles/<_insert_Pubmed_Central_ID_>?pdf=render

It seems that UKPubmed does not have APIs like Entrez. I don't know if you can get the Pubmed Central id from the Entrez Eutils, which are already implemented in taverna, but if you could then it would be easier.

In any case, with all these methods you will only be able to download only the open access articles.

ADD COMMENT
0
Entering edit mode

Only OA is fine :)

ADD REPLY
1
Entering edit mode
13.3 years ago
Andrea_Bio ★ 2.8k

any use? if not exactly what you need you could modify the top 2.

http://www.myexperiment.org/packs/163.html

ADD COMMENT

Login before adding your answer.

Traffic: 2714 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6