Complex queries on pubmed abstracts
1
0
Entering edit mode
10.0 years ago
fedotovp • 0

I want to make a complex query / a lot of queries to pubmed api.

My problem is I have a lot of gene symbols (~ 20.000) and some term as input (for example, inflammation). I want to search through all pubmed titles and abstracts and get list of most popular genes that occure with the term. So my naive algorithm is to make 20000 queries like

gene1 AND term

gene2 AND term

...,

gene20000 AND term

and sort number of results for each query. But of course I can't do so much queries (there is a limit for number of queries per second).

Another way is to make a query for the term, download all results and after that make search locally. But there could be a lot of results and process of dowloading may take hours in such case.

Do you know any way to make such queries relatively fast?

pubmed • 1.9k views
ADD COMMENT
2
Entering edit mode
10.0 years ago
using efetch get all the PMIDs for:
term
then get the intersection with gene2pubmed : ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2pubmed.gz
ADD COMMENT
0
Entering edit mode

Thank you! Didn't know about that database before.

Although it looks like this database isn't full, at least two random PMIDs weren't found there.

ADD REPLY
0
Entering edit mode

these are the PMID's declared in the 'gene' database.

ADD REPLY

Login before adding your answer.

Traffic: 2579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6