NCBI Accession Number to Taxonomy ID
2
5
Entering edit mode
5.4 years ago
mrsmith ▴ 50

I am trying to convert a long list of NCBI accession numbers from the nt database into taxonomy ID so that I can get lineage information from a large file of 16S blast results. I would just rerun the blast and change the output format, but it took SOOO long to run these blasts on 12 samples. Additionally, I have filtered all of the blast results for the best hit for each read, and then organized all of those best hits into a table of counts for each sample. I would LOVE if I could just take the accession numbers in my table of counts, convert them into Taxon I.D.'s, and then use https://github.com/zyxue/ncbitax2lin to convert the Taxon I.D.s into lineage info. However, I am stuck and I can't figure out how to convert the accession numbers to Taxon I.D.s.

I have tried the ETE toolkit to no avail. Yes, I already looked at Accession number to taxonomy id after blasting but that post didn't help me all that much.

I am really new to this whole bioinformatics thing, and I'm feeling a little lost and could really use the help of some BioStars like yourselves! I am sorry for my ineptness in advance. I appreciate any info or direction that you can lead me in!

ncbi 16s blast • 12k views
ADD COMMENT
0
Entering edit mode

Hey there,

I am trying to do the same thing but am running into some problems.

S620100019205:~/Documents/CaoBin/October-2018/trimmed_duk_kmer31/Assembly-Megahit/MFC280618_megahit/BLAST/Input$ cat accession.txt | epost -db nuccore | esummary -db nuccore | xtract -pattern DocumentSummary -element Caption,TaxId

ERROR in fetch input: Search Backend failed: read request has timed out. peer: 130.14.18.27:7011

Could anyone kindly advice?

Thanks

ADD REPLY
0
Entering edit mode

Query in @vkkodali's answer below is working for me so it was either a temporary issue or if problem still persists then it may be something on your end. Look into local firewall settings since it looks like a port appears to be blocked locally.

ADD REPLY
6
Entering edit mode
5.4 years ago
vkkodali_ncbi ★ 3.7k

You can use Entrez Direct for this.

esummary -db nuccore -id NM_002826 | xtract -pattern DocumentSummary -element Caption,TaxId
NM_002826      9606

If you have a lot of accessions, you can use epost first to post the list of all accessions first and then pipe it to esummary as follows:

cat <filename> | epost -db nuccore | esummary -db nuccore | xtract -pattern DocumentSummary -element Caption,TaxId
ADD COMMENT
0
Entering edit mode

That worked fantastically! Thanks for your help, I really appreciate it!

ADD REPLY
0
Entering edit mode

Has anyone tried to download Entrez Direct recently? I'm using the command below, as directed here: https://www.ncbi.nlm.nih.gov/books/NBK179288/

sh -c "$(wget -q ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/install-edirect.sh -O -)"

It keeps erroring out for me giving a bunch of errors, and isn't able to download it correctly.

ADD REPLY
0
Entering edit mode

Are you using Mac or Linux? Specifically, do you have wget on your machine? Can you paste the error you are seeing? I just tried the same command on my Linux machine and it works fine.

Alternatively, you can download the install-edirect.sh script from the FTP path here: https://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/ and run it from your bash shell.

ADD REPLY
3
Entering edit mode
5.4 years ago
GenoMax 141k

You can use NCBI unix utils to get this information. An example:

$ efetch -db nuccore -id "U20753.1" -format docsum | xtract -pattern DocumentSummary -element TaxId
9685

If you post some examples of your accession numbers I am happy to check them.

ADD COMMENT
0
Entering edit mode

Thanks so much for your help!

I may not have been detailed enough in my initial question. My current file looks like this:

BC04    BC05    BC16    BC17    BC28    BC29    BC40    BC41    BC52    BC64    BC76    BC88

MG576168.1      0       0       0       0       0       0       0       0       0       0       1       1

AB948667.1      0       0       0       1       0       0       0       0       0       0       0       0

DQ125562.1      1       25      2       21      0       13      0       0       0       2       6       7

DQ836750.1      0       0       0       0       0       1       0       0       0       0       0       0

FN296805.1      2       1       2       5       5       5       6       3       2       4       2       2

JQ041442.1      0       0       0       0       0       0       1       0       0       0       2       2

MF112006.1      1       0       0       0       0       0       0       0       0       0       0       0

KY643688.1      0       0       0       0       0       0       0       1       0       0       0       0

...etc. for about 10,000 accession numbers. Is there a way for me to get the taxa ID using the NCBI unix tools (or whatever) to make these accession numbers taxa IDs?

ADD REPLY
2
Entering edit mode

Looks like the data in the first column are accessions. I am not sure what the items in the first row are. First, you need to get all the accessions into a text file that looks something like this:

$ cat temp.txt
MG576168.1
AB948667.1
DQ125562.1
DQ836750.1
FN296805.1
JQ041442.1
MF112006.1
KY643688.1
$ cat temp.txt | epost -db nuccore | esummary -db nuccore | xtract -pattern DocumentSummary -element Caption,TaxId
MG576168        219572
MF112006        306
KY643688        1886637
AB948667        77133
JQ041442        77133
DQ836750        77133
DQ125562        77133
FN296805        77133

You will want to read up on Entrez Direct (the NCBI e-utils on the unix command line) if you want to do this yourself.

ADD REPLY
0
Entering edit mode

That worked really well, thanks a lot! I appreciate the help so much!

ADD REPLY

Login before adding your answer.

Traffic: 992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6