NCBI eutils EFETCH only fetch elements needed (or remove abstract from response)?
1
1
Entering edit mode
3.2 years ago

Hello,

I am using the eutils API to fetch metadata about the articles. It includes a lot of information about each article, making responses slow if I am downloading thousands of article data. Is it at all possible to only fetch the elements I need?

I have tried XML, medline text, abstract text, but couldnt reduce the response size to what I needed.

For eg: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=33631753,33631593&rettype=abstract&retmode=text

I am trying to reduce the elements returned per article, and hence reduce the response size. Is there any way possible to tell the API call to only get what I need?

Eg: Can I exclude the abstract from the response?That will remove a huge chunk of data and will reduce the response size massively!

Thanks.

efetch eutils API pubmed ncbi • 1.4k views
ADD COMMENT
0
Entering edit mode

I don't think you can do that. API is designed to retrieve a minimum set of elements. You will have to post filter.

ADD REPLY
0
Entering edit mode

Do you mean filtering after getting the responses, or is it an option in the API?

ADD REPLY
1
Entering edit mode
3.1 years ago

you can't but you can always filter out the result on the fly. e.g:

wget -q -O - "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=33631753,33631593&rettype=medline&retmode=text" | awk '{if($0 ~ /^ /) {if(A!=1) print;next} else if($1=="AB") {A=1} else {print;A=0;}}'

in XML mode, you can filter out the result by writing a SAX parser. https://en.wikipedia.org/wiki/Simple_API_for_XML#XML_processing_with_SAX

ADD COMMENT
0
Entering edit mode

Thanks for the reply, Pierre. Guessed so.

I can filter later, but if I am calling 10000 efetch docs, for example, the responses start becoming rather data-heavy and crashes the browser. Also, the initial time taken to get the response for retmax=10000 for example, is pretty slow if it returns all the fields like it does now.

Trying to find the best way to get data for all the articles related to a search term as quickly as possible. With the fact now that I cannot choose elements to return, and the limitation of upto 10 API calls a second and with each call taking 30-90 seconds to return(retmax 10000), doesnt seem like there is a solution unless I am missing something really obvious.

ADD REPLY
0
Entering edit mode

and crashes the browser.

use CLI

ADD REPLY
0
Entering edit mode

@Pierre is referring to EntrezDirect:

$ efetch -db pubmed -id 33631753 -format xml | xtract -pattern PubmedArticle -element LastName,ForeName -sep "\n"
Wang    Yang    Li  Wang    Shan    Qu  Wenxin  Zhaochuan   Meixiang    Zhenhong    Yanchun Zhenghai
ADD REPLY
0
Entering edit mode

Thanks. I will learn more about EntrezDirect.

ADD REPLY

Login before adding your answer.

Traffic: 3093 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6