Question

How to get genomes related to published paper in NCBI

0

Entering edit mode

14 months ago

marine.bergot • 0

Hi!

I'm currently trying to computerize a search which is: 1) get all genomes on NCBI related to certain organism + refseq and so on. i'm doing that with Biopython and Entrez

query = "Microbacterium[Organism] AND latest_refseq[filter] NOT partial[filter]"
handle = Entrez.esearch(term=query, db="Assembly", retmax=600)
ids = Entrez.read(handle)["IdList"]

here : 513 genomes

2) second part would be to add another filter to get only the assemblies linked to a published paper but i have no idea how i could to that. Unfortunatly i can't see some usefull tags related to publication for assembly db in biopython but maybe others ways exist? i'm open to every way not just python/biopython thanks !

ncbi bash biopython • 610 views

ADD COMMENT • link updated 14 months ago by GenoMax 141k • written 14 months ago by marine.bergot • 0

0

Entering edit mode

How to download all Pseudomonas aeruginosa Genomes from NCBI Genomes database?
How to download specific genomes
How to download genome assemblies from NCBI with a list of GCA identifiers?
downloading genomes in fasta format from accession ids

ADD REPLY • link 14 months ago by GenoMax 141k

0

Entering edit mode

yeah thanks bu i don't need help for the first part, i know how to dl my genomes but not how to add informations about the fact that every genome is related to a published paper or not

ADD REPLY • link 14 months ago by marine.bergot • 0

score 1 · Answer 1 · 2023-02-09

Using EntrezDirect. It may not always work for all accessions. In your case only 41 results from original search seem to be linked to a paper.

$ esearch -db assembly -query "Microbacterium[Organism] AND latest_refseq[filter] NOT partial[filter]" |  elink -target pubmed | esummary | xtract -pattern DocumentSummary -element Id,Title,Value 
35331789        Genome sequencing of a novel Microbacterium camelliasinensis CIAB417 identified potential mannan hydrolysing enzymes.   35331789    10.1016/j.ijbiomac.2022.03.093  S0141-8130(22)00563-3
34371613        Identification of Plant Growth Promoting Rhizobacteria That Improve the Performance of Greenhouse-Grown Petunias under Low Fertility Conditions.    34371613        PMC8309264      pmc-id: PMC8309264;     10.3390/plants10071410  plants10071410
34225488        Poor Competitiveness of Bradyrhizobium in Pigeon Pea Root Colonization in Indian Soils. 34225488        PMC8406239      pmc-id: PMC8406239; 10.1128/mBio.00423-21
34022615        Bacteria of eleven different species isolated from biofilms in a meat processing environment have diverse biofilm forming abilities.        34022615        10.1016/j.ijfoodmicro.2021.109232       S0168-1605(21)00191-4
33578887        Comparative Metabologenomics Analysis of Polar Actinomycetes.   33578887        PMC7916644      pmc-id: PMC7916644;     10.3390/md19020103  md19020103

Note: While I show the example as one step search you may need to do this in two steps if you want to keep track of accession numbers and ones that actually produce a result. Piping in EntrezDirect does not keep track of the query across the pipes so you will need to do this yourself.