Accesing reference genome from Genome database (ncbi) with biopython
1
0
Entering edit mode
2.5 years ago
Daniel • 0

Hello all,

I would like to acces to the reference genome RefSeq UID given a taxonomy id using the Genome database with biopython.

I will try to explain with images what I mean. I search in the Genome database using a taxonomy id. It returns me a single result, then i click on the "Reference genome" link.

search of a determinated genome with taxonomy id

Now I scroll to the bottom of the page and get RefSeq reference genome UID for the given taxonomy ID.

After clicking the link i can get the RefSeq uid

Is it possible to achieve this using biopython ?

taxonomyID genome reference biopython • 1.1k views
ADD COMMENT
0
Entering edit mode

If you must use biopython then you should be able to use Bio.entrez package (LINK).

Using Entrezdirect you can simply do:

$ efetch -db nuccore -id NC_000913.3 -format fasta > NC_000913.fa
>NC_000913.3 Escherichia coli str. K-12 substr. MG1655, complete genome
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA
TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC
ADD REPLY
0
Entering edit mode

Sorry I think I did not explain myself correctly. I want to type a code that returns the RefSec reference genome UID and as input you only give it the taxonomy ID. So later I can fetch it from nucleotide db as you posted (this I already know how to do it).

ADD REPLY
1
Entering edit mode
2.5 years ago
GenoMax 141k

Using Entrezdirect (truncated to save space).

$ esearch -db taxonomy -query "1005566  [taxID]" | elink -target nuccore | efetch -format docsum | xtract -pattern DocumentSummary -if SourceDb -contains refseq -element Caption,Title,SourceDb
NZ_AMUP00000000 Escherichia coli 07798, whole genome shotgun sequencing project refseq
NZ_JH964525 Escherichia coli 07798 strain 7798 E07798.contig.252, whole genome shotgun sequence refseq
NZ_JH964524 Escherichia coli 07798 strain 7798 E07798.contig.251, whole genome shotgun sequence refseq
NZ_JH964523 Escherichia coli 07798 strain 7798 E07798.contig.249, whole genome shotgun sequence refseq
NZ_JH964522 Escherichia coli 07798 strain 7798 E07798.contig.248, whole genome shotgun sequence refseq
NZ_JH964521 Escherichia coli 07798 strain 7798 E07798.contig.247, whole genome shotgun sequence refseq
NZ_JH964520 Escherichia coli 07798 strain 7798 E07798.contig.246, whole genome shotgun sequence refseq
NZ_JH964519 Escherichia coli 07798 strain 7798 E07798.contig.245, whole genome shotgun sequence refseq
NZ_JH964518 Escherichia coli 07798 strain 7798 E07798.contig.244, whole genome shotgun sequence refseq
NZ_JH964517 Escherichia coli 07798 strain 7798 E07798.contig.241, whole genome shotgun sequence refseq

If you only want NC* accessions then

$ esearch -db taxonomy -query "511145  [taxID]" | elink -target nuccore | efetch -format docsum | xtract -pattern DocumentSummary -if SourceDb -contains refseq -element Caption,Title,SourceDb | grep NC
NC_000913   Escherichia coli str. K-12 substr. MG1655, complete genome  refseq
ADD COMMENT

Login before adding your answer.

Traffic: 2372 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6