Question

Complex queries on pubmed abstracts

0

Entering edit mode

10.0 years ago

fedotovp • 0

I want to make a complex query / a lot of queries to pubmed api.

My problem is I have a lot of gene symbols (~ 20.000) and some term as input (for example, inflammation). I want to search through all pubmed titles and abstracts and get list of most popular genes that occure with the term. So my naive algorithm is to make 20000 queries like

gene1 AND term

gene2 AND term

...,

gene20000 AND term

and sort number of results for each query. But of course I can't do so much queries (there is a limit for number of queries per second).

Another way is to make a query for the term, download all results and after that make search locally. But there could be a lot of results and process of dowloading may take hours in such case.

Do you know any way to make such queries relatively fast?

pubmed • 1.9k views

ADD COMMENT • link updated 10.0 years ago by Pierre Lindenbaum 161k • written 10.0 years ago by fedotovp • 0

Ram · Accepted Answer · 2014-04-21

2

Entering edit mode

10.0 years ago

Pierre Lindenbaum 161k

using efetch get all the PMIDs for:

term

then get the intersection with gene2pubmed : ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2pubmed.gz

ADD COMMENT • link 10.0 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thank you! Didn't know about that database before.

Although it looks like this database isn't full, at least two random PMIDs weren't found there.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by fedotovp • 0

0

Entering edit mode

these are the PMID's declared in the 'gene' database.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Pierre Lindenbaum 161k