Question

NCBI Accession Number to Taxonomy ID

5

Entering edit mode

5.4 years ago

mrsmith ▴ 50

I am trying to convert a long list of NCBI accession numbers from the nt database into taxonomy ID so that I can get lineage information from a large file of 16S blast results. I would just rerun the blast and change the output format, but it took SOOO long to run these blasts on 12 samples. Additionally, I have filtered all of the blast results for the best hit for each read, and then organized all of those best hits into a table of counts for each sample. I would LOVE if I could just take the accession numbers in my table of counts, convert them into Taxon I.D.'s, and then use https://github.com/zyxue/ncbitax2lin to convert the Taxon I.D.s into lineage info. However, I am stuck and I can't figure out how to convert the accession numbers to Taxon I.D.s.

I have tried the ETE toolkit to no avail. Yes, I already looked at Accession number to taxonomy id after blasting but that post didn't help me all that much.

I am really new to this whole bioinformatics thing, and I'm feeling a little lost and could really use the help of some BioStars like yourselves! I am sorry for my ineptness in advance. I appreciate any info or direction that you can lead me in!

ncbi 16s blast • 12k views

ADD COMMENT • link updated 5.1 years ago by Ming ▴ 110 • written 5.4 years ago by mrsmith ▴ 50

0

Entering edit mode

Hey there,

I am trying to do the same thing but am running into some problems.

S620100019205:~/Documents/CaoBin/October-2018/trimmed_duk_kmer31/Assembly-Megahit/MFC280618_megahit/BLAST/Input$ cat accession.txt | epost -db nuccore | esummary -db nuccore | xtract -pattern DocumentSummary -element Caption,TaxId

ERROR in fetch input: Search Backend failed: read request has timed out. peer: 130.14.18.27:7011

Could anyone kindly advice?

Thanks

ADD REPLY • link 5.1 years ago by Ming ▴ 110

0

Entering edit mode

Query in @vkkodali's answer below is working for me so it was either a temporary issue or if problem still persists then it may be something on your end. Look into local firewall settings since it looks like a port appears to be blocked locally.

ADD REPLY • link 5.1 years ago by GenoMax 141k

score 6 · Answer 1 · 2018-12-07

6

Entering edit mode

5.4 years ago

vkkodali_ncbi ★ 3.7k

You can use Entrez Direct for this.

esummary -db nuccore -id NM_002826 | xtract -pattern DocumentSummary -element Caption,TaxId
NM_002826      9606

If you have a lot of accessions, you can use epost first to post the list of all accessions first and then pipe it to esummary as follows:

cat <filename> | epost -db nuccore | esummary -db nuccore | xtract -pattern DocumentSummary -element Caption,TaxId

ADD COMMENT • link 5.4 years ago by vkkodali_ncbi ★ 3.7k

0

Entering edit mode

That worked fantastically! Thanks for your help, I really appreciate it!

ADD REPLY • link 5.4 years ago by mrsmith ▴ 50

0

Entering edit mode

Has anyone tried to download Entrez Direct recently? I'm using the command below, as directed here: https://www.ncbi.nlm.nih.gov/books/NBK179288/

sh -c "$(wget -q ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/install-edirect.sh -O -)"

It keeps erroring out for me giving a bunch of errors, and isn't able to download it correctly.

ADD REPLY • link 4.4 years ago by drikaul ▴ 20

0

Entering edit mode

Are you using Mac or Linux? Specifically, do you have wget on your machine? Can you paste the error you are seeing? I just tried the same command on my Linux machine and it works fine.

Alternatively, you can download the install-edirect.sh script from the FTP path here: https://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/ and run it from your bash shell.

ADD REPLY • link 4.4 years ago by vkkodali_ncbi ★ 3.7k

GenoMax · Answer 2 · 2018-12-07

3

Entering edit mode

5.4 years ago

GenoMax 141k

You can use NCBI unix utils to get this information. An example:

$ efetch -db nuccore -id "U20753.1" -format docsum | xtract -pattern DocumentSummary -element TaxId
9685

If you post some examples of your accession numbers I am happy to check them.

ADD COMMENT • link 5.4 years ago by GenoMax 141k

0

Entering edit mode

Thanks so much for your help!

I may not have been detailed enough in my initial question. My current file looks like this:

BC04    BC05    BC16    BC17    BC28    BC29    BC40    BC41    BC52    BC64    BC76    BC88

MG576168.1      0       0       0       0       0       0       0       0       0       0       1       1

AB948667.1      0       0       0       1       0       0       0       0       0       0       0       0

DQ125562.1      1       25      2       21      0       13      0       0       0       2       6       7

DQ836750.1      0       0       0       0       0       1       0       0       0       0       0       0

FN296805.1      2       1       2       5       5       5       6       3       2       4       2       2

JQ041442.1      0       0       0       0       0       0       1       0       0       0       2       2

MF112006.1      1       0       0       0       0       0       0       0       0       0       0       0

KY643688.1      0       0       0       0       0       0       0       1       0       0       0       0

...etc. for about 10,000 accession numbers. Is there a way for me to get the taxa ID using the NCBI unix tools (or whatever) to make these accession numbers taxa IDs?

ADD REPLY • link updated 5.4 years ago by GenoMax 141k • written 5.4 years ago by mrsmith ▴ 50

2

Entering edit mode

Looks like the data in the first column are accessions. I am not sure what the items in the first row are. First, you need to get all the accessions into a text file that looks something like this:

$ cat temp.txt
MG576168.1
AB948667.1
DQ125562.1
DQ836750.1
FN296805.1
JQ041442.1
MF112006.1
KY643688.1
$ cat temp.txt | epost -db nuccore | esummary -db nuccore | xtract -pattern DocumentSummary -element Caption,TaxId
MG576168        219572
MF112006        306
KY643688        1886637
AB948667        77133
JQ041442        77133
DQ836750        77133
DQ125562        77133
FN296805        77133

You will want to read up on Entrez Direct (the NCBI e-utils on the unix command line) if you want to do this yourself.