Question: NCBI Accession Number to Taxonomy ID
1
gravatar for mrsmith
7 months ago by
mrsmith10
mrsmith10 wrote:

I am trying to convert a long list of NCBI accession numbers from the nt database into taxonomy ID so that I can get lineage information from a large file of 16S blast results. I would just rerun the blast and change the output format, but it took SOOO long to run these blasts on 12 samples. Additionally, I have filtered all of the blast results for the best hit for each read, and then organized all of those best hits into a table of counts for each sample. I would LOVE if I could just take the accession numbers in my table of counts, convert them into Taxon I.D.'s, and then use https://github.com/zyxue/ncbitax2lin to convert the Taxon I.D.s into lineage info. However, I am stuck and I can't figure out how to convert the accession numbers to Taxon I.D.s.

I have tried the ETE toolkit to no avail. Yes, I already looked at Accession number to taxonomy id after blasting but that post didn't help me all that much.

I am really new to this whole bioinformatics thing, and I'm feeling a little lost and could really use the help of some BioStars like yourselves! I am sorry for my ineptness in advance. I appreciate any info or direction that you can lead me in!

blast 16s ncbi • 890 views
ADD COMMENTlink modified 3 months ago by Ming30 • written 7 months ago by mrsmith10

Hey there,

I am trying to do the same thing but am running into some problems.

S620100019205:~/Documents/CaoBin/October-2018/trimmed_duk_kmer31/Assembly-Megahit/MFC280618_megahit/BLAST/Input$ cat accession.txt | epost -db nuccore | esummary -db nuccore | xtract -pattern DocumentSummary -element Caption,TaxId

ERROR in fetch input: Search Backend failed: read request has timed out. peer: 130.14.18.27:7011

Could anyone kindly advice?

Thanks

ADD REPLYlink written 3 months ago by Ming30

Query in @vkkodali's answer below is working for me so it was either a temporary issue or if problem still persists then it may be something on your end. Look into local firewall settings since it looks like a port appears to be blocked locally.

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax69k
3
gravatar for vkkodali
7 months ago by
vkkodali1.1k
United States
vkkodali1.1k wrote:

You can use Entrez Direct for this.

esummary -db nuccore -id NM_002826 | xtract -pattern DocumentSummary -element Caption,TaxId
NM_002826      9606

If you have a lot of accessions, you can use epost first to post the list of all accessions first and then pipe it to esummary as follows:

cat <filename> | epost -db nuccore | esummary -db nuccore | xtract -pattern DocumentSummary -element Caption,TaxId
ADD COMMENTlink written 7 months ago by vkkodali1.1k

That worked fantastically! Thanks for your help, I really appreciate it!

ADD REPLYlink written 7 months ago by mrsmith10
2
gravatar for genomax
7 months ago by
genomax69k
United States
genomax69k wrote:

You can use NCBI unix utils to get this information. An example:

$ efetch -db nuccore -id "U20753.1" -format docsum | xtract -pattern DocumentSummary -element TaxId
9685

If you post some examples of your accession numbers I am happy to check them.

ADD COMMENTlink written 7 months ago by genomax69k

Thanks so much for your help!

I may not have been detailed enough in my initial question. My current file looks like this:

BC04    BC05    BC16    BC17    BC28    BC29    BC40    BC41    BC52    BC64    BC76    BC88

MG576168.1      0       0       0       0       0       0       0       0       0       0       1       1

AB948667.1      0       0       0       1       0       0       0       0       0       0       0       0

DQ125562.1      1       25      2       21      0       13      0       0       0       2       6       7

DQ836750.1      0       0       0       0       0       1       0       0       0       0       0       0

FN296805.1      2       1       2       5       5       5       6       3       2       4       2       2

JQ041442.1      0       0       0       0       0       0       1       0       0       0       2       2

MF112006.1      1       0       0       0       0       0       0       0       0       0       0       0

KY643688.1      0       0       0       0       0       0       0       1       0       0       0       0

...etc. for about 10,000 accession numbers. Is there a way for me to get the taxa ID using the NCBI unix tools (or whatever) to make these accession numbers taxa IDs?

ADD REPLYlink modified 7 months ago by genomax69k • written 7 months ago by mrsmith10

Looks like the data in the first column are accessions. I am not sure what the items in the first row are. First, you need to get all the accessions into a text file that looks something like this:

$ cat temp.txt
MG576168.1
AB948667.1
DQ125562.1
DQ836750.1
FN296805.1
JQ041442.1
MF112006.1
KY643688.1
$ cat temp.txt | epost -db nuccore | esummary -db nuccore | xtract -pattern DocumentSummary -element Caption,TaxId
MG576168        219572
MF112006        306
KY643688        1886637
AB948667        77133
JQ041442        77133
DQ836750        77133
DQ125562        77133
FN296805        77133

You will want to read up on Entrez Direct (the NCBI e-utils on the unix command line) if you want to do this yourself.

ADD REPLYlink written 7 months ago by vkkodali1.1k

That worked really well, thanks a lot! I appreciate the help so much!

ADD REPLYlink written 7 months ago by mrsmith10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1559 users visited in the last hour