Question

Obtaining NCBI GI Numbers from Taxonomy ID (for Entrez efetch query)

3

Entering edit mode

9.2 years ago

patroos ▴ 70

I'm trying to automatically obtain fasta files from the NCBI nucleotide database for a list of taxonomy IDs. I know I can use Entrez's efetch but it expects a GI number, which I do not have a list of. Is there a way to fetch by taxonomy ID or a straight-forward and non-manual way to get GI numbers from taxonomy IDs?

Thanks much for any help!

nucleotide Entrez NCBI • 6.7k views

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by patroos ▴ 70

0

Entering edit mode

Duplicate of How To Retrieve All Sequences, From Ncbi, That Belong To A Specific Txid And Its Sub Txids?

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Pierre Lindenbaum 161k

score 6 · Answer 1 · 2015-02-12

6

Entering edit mode

9.2 years ago

5heikki 11k

This is really simple with Entrez Direct

epost -db taxonomy -id 63221 | elink -target nuccore | efetch -format uid
196123578
677001457
634744538
634744524
2286205
584458899
513134556
398637208
315623200
270209679
270209678
270209677
262527002
253947345
253947331
253947317
253947303
253947289
222350099
222350097
195972535
158958247
158251955
158251954
111035029
91075865
28557455
11141613
11141612
7769684
4927255

Also works with -target nucgss if that's what you're interested in. You can also skip the gi part and efetch -format fasta

ADD COMMENT • link 9.2 years ago by 5heikki 11k

1

Entering edit mode

Fantastic, this is very useful. Thanks!

ADD REPLY • link 9.2 years ago by patroos ▴ 70

Ram · Answer 2 · 2015-02-12

3

Entering edit mode

9.2 years ago

patroos ▴ 70

Haven't found a great solution for this. But the work-around I am using now is downloading /pub/taxonomy/gi_taxid_nucl.dmp.gz from the NCBI ftp server. The file maps GI numbers to taxonomy id, and I search it to get the GI numbers for a given taxonomy ID.

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by patroos ▴ 70

Ram · Answer 3 · 2015-02-12

The elink util will turn up cross-references between NCBI databases.

In this case you want to find links between the taxonomy and nucleotide databases. Here's a demo using the R pacakge rentrez (there are similar libraries for pretty much all popular scripting languages, and even command line utils for this):

## find sequenced linked to a taxid
tax_seqs <- entrez_link(db = "nuccore", dbfrom = "taxonomy", id=5911)
#elink result with ids from 2 databases:
#[1] taxonomy_nuccore        taxonomy_nucleotide_exp

grab them

tmp <- tempfile()
recs <- entrez_fetch(db="nuccore", id=tax_seqs$taxonomy_nuccore[1:3], rettype="fasta")
cat(recs, file=tmp)
ape::read.dna(tmp, format="fasta")
#3 DNA sequences in binary format stored in a list.
#
# Mean sequence length: 1493
# Shortest sequence: 1137
# Longest sequence: 1779
#
# Labels: gi|697738807|gb|KM406498.1| Tetrahymena thermophila Pat2 mRNA, complete cds #gi|697738801|gb|KM406497.1|  #Tetrahymena thermophila Tpt1 mRNA, complete cds #gi|697738796|gb|KM406496.1| Tetrahymena thermophila Pat1 mRNA, complete #cds

score 0 · Answer 4 · 2015-05-23

0

Entering edit mode

8.9 years ago

natasha.sernova ★ 4.0k

It's much simplier - see http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook.

Good luck!

ADD COMMENT • link 8.9 years ago by natasha.sernova ★ 4.0k