Hi
I would like to write a edirect query to extract number of publications per gene per year. The group I am interested in is Viridiplantae. So for all species under this group, given a date range, I would like to get the publication count for each gene in that species. The final output that I am looking for is something like
YEAR Genus_Species Gene_Symbol Publication_Count
1970 Arabidopsis thaliana PHYA 3
1971 Arabidopsis thaliana PHYA 2
I have gotten this far. For an example gene id (816394) in taxon Arabidopsis thaliana (txid3702) I can get the count of all the pubmed articles related to this gene
esearch -db gene -query "txid3702[Organism:exp] AND 816394[UID]" | elink -target pubmed
After this the next step is to download in xml or docsum format the articles and filter the articles by date [PDAT] of publication. This is the strategy I am using. I used this next command but the error was "Too many requests"
esearch -db gene -query "txid3702[Organism:exp] AND 816394[UID]" | elink -target pubmed | efetch -format xml | xtract -pattern PubmedArticle -element PubDate
I don't know how to get around this. Thanks for the help
There are a few links and posts for similar tasks:
1) https://rpsychologist.com/an-r-script-to-automatically-look-at-pubmed-citation-counts-by-year-of-publication
2)The solution below suggests xml-output:
BioPython ESearch XML File & Dates
3) edirect: Number of authors per year in Pubmed for a given query
4) Attempting To Utilise The New Entrez Direct Package But Having Difficulty With Pubmed And Nucleotide Xml Parsing
5) How do I make this perl script work to fetch sequences from NCBI using gene symbols?