How to get genomes related to published paper in NCBI
1
0
Entering edit mode
14 months ago

Hi!

I'm currently trying to computerize a search which is: 1) get all genomes on NCBI related to certain organism + refseq and so on. i'm doing that with Biopython and Entrez

query = "Microbacterium[Organism] AND latest_refseq[filter] NOT partial[filter]"
handle = Entrez.esearch(term=query, db="Assembly", retmax=600)
ids = Entrez.read(handle)["IdList"]

here : 513 genomes

2) second part would be to add another filter to get only the assemblies linked to a published paper but i have no idea how i could to that. Unfortunatly i can't see some usefull tags related to publication for assembly db in biopython but maybe others ways exist? i'm open to every way not just python/biopython thanks !

ncbi bash biopython • 603 views
ADD COMMENT
0
Entering edit mode

yeah thanks bu i don't need help for the first part, i know how to dl my genomes but not how to add informations about the fact that every genome is related to a published paper or not

ADD REPLY
1
Entering edit mode
14 months ago
GenoMax 141k

Using EntrezDirect. It may not always work for all accessions. In your case only 41 results from original search seem to be linked to a paper.

$ esearch -db assembly -query "Microbacterium[Organism] AND latest_refseq[filter] NOT partial[filter]" |  elink -target pubmed | esummary | xtract -pattern DocumentSummary -element Id,Title,Value 
35331789        Genome sequencing of a novel Microbacterium camelliasinensis CIAB417 identified potential mannan hydrolysing enzymes.   35331789    10.1016/j.ijbiomac.2022.03.093  S0141-8130(22)00563-3
34371613        Identification of Plant Growth Promoting Rhizobacteria That Improve the Performance of Greenhouse-Grown Petunias under Low Fertility Conditions.    34371613        PMC8309264      pmc-id: PMC8309264;     10.3390/plants10071410  plants10071410
34225488        Poor Competitiveness of Bradyrhizobium in Pigeon Pea Root Colonization in Indian Soils. 34225488        PMC8406239      pmc-id: PMC8406239; 10.1128/mBio.00423-21
34022615        Bacteria of eleven different species isolated from biofilms in a meat processing environment have diverse biofilm forming abilities.        34022615        10.1016/j.ijfoodmicro.2021.109232       S0168-1605(21)00191-4
33578887        Comparative Metabologenomics Analysis of Polar Actinomycetes.   33578887        PMC7916644      pmc-id: PMC7916644;     10.3390/md19020103  md19020103

Note: While I show the example as one step search you may need to do this in two steps if you want to keep track of accession numbers and ones that actually produce a result. Piping in EntrezDirect does not keep track of the query across the pipes so you will need to do this yourself.

ADD COMMENT

Login before adding your answer.

Traffic: 2560 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6