Question

Get publication counts per gene per year

0

Entering edit mode

5.1 years ago

abhijit.synl ▴ 60

Hi

I would like to write a edirect query to extract number of publications per gene per year. The group I am interested in is Viridiplantae. So for all species under this group, given a date range, I would like to get the publication count for each gene in that species. The final output that I am looking for is something like

YEAR Genus_Species Gene_Symbol Publication_Count
1970 Arabidopsis thaliana PHYA 3
1971 Arabidopsis thaliana PHYA 2

I have gotten this far. For an example gene id (816394) in taxon Arabidopsis thaliana (txid3702) I can get the count of all the pubmed articles related to this gene

esearch -db gene -query "txid3702[Organism:exp] AND 816394[UID]" | elink -target pubmed

After this the next step is to download in xml or docsum format the articles and filter the articles by date [PDAT] of publication. This is the strategy I am using. I used this next command but the error was "Too many requests"

esearch -db gene -query "txid3702[Organism:exp] AND 816394[UID]" | elink -target pubmed | efetch -format xml | xtract -pattern PubmedArticle -element PubDate

I don't know how to get around this. Thanks for the help

eutilities • 999 views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 5.1 years ago by abhijit.synl ▴ 60

0

Entering edit mode

There are a few links and posts for similar tasks:

1) https://rpsychologist.com/an-r-script-to-automatically-look-at-pubmed-citation-counts-by-year-of-publication

2)The solution below suggests xml-output:

BioPython ESearch XML File & Dates

3) edirect: Number of authors per year in Pubmed for a given query

4) Attempting To Utilise The New Entrez Direct Package But Having Difficulty With Pubmed And Nucleotide Xml Parsing

5) How do I make this perl script work to fetch sequences from NCBI using gene symbols?

ADD REPLY • link 5.1 years ago by natasha.sernova ★ 4.0k