Question: NCBI Accession Number to Taxonomy ID
0
gravatar for mrsmith
10 weeks ago by
mrsmith0
mrsmith0 wrote:

I am trying to convert a long list of NCBI accession numbers from the nt database into taxonomy ID so that I can get lineage information from a large file of 16S blast results. I would just rerun the blast and change the output format, but it took SOOO long to run these blasts on 12 samples. Additionally, I have filtered all of the blast results for the best hit for each read, and then organized all of those best hits into a table of counts for each sample. I would LOVE if I could just take the accession numbers in my table of counts, convert them into Taxon I.D.'s, and then use https://github.com/zyxue/ncbitax2lin to convert the Taxon I.D.s into lineage info. However, I am stuck and I can't figure out how to convert the accession numbers to Taxon I.D.s.

I have tried the ETE toolkit to no avail. Yes, I already looked at Accession number to taxonomy id after blasting but that post didn't help me all that much.

I am really new to this whole bioinformatics thing, and I'm feeling a little lost and could really use the help of some BioStars like yourselves! I am sorry for my ineptness in advance. I appreciate any info or direction that you can lead me in!

blast 16s ncbi • 289 views
ADD COMMENTlink modified 10 weeks ago by vkkodali940 • written 10 weeks ago by mrsmith0
2
gravatar for vkkodali
10 weeks ago by
vkkodali940
United States
vkkodali940 wrote:

You can use Entrez Direct for this.

esummary -db nuccore -id NM_002826 | xtract -pattern DocumentSummary -element Caption,TaxId
NM_002826      9606

If you have a lot of accessions, you can use epost first to post the list of all accessions first and then pipe it to esummary as follows:

cat <filename> | epost -db nuccore | esummary -db nuccore | xtract -pattern DocumentSummary -element Caption,TaxId
ADD COMMENTlink written 10 weeks ago by vkkodali940

That worked fantastically! Thanks for your help, I really appreciate it!

ADD REPLYlink written 10 weeks ago by mrsmith0
2
gravatar for genomax
10 weeks ago by
genomax62k
United States
genomax62k wrote:

You can use NCBI unix utils to get this information. An example:

$ efetch -db nuccore -id "U20753.1" -format docsum | xtract -pattern DocumentSummary -element TaxId
9685

If you post some examples of your accession numbers I am happy to check them.

ADD COMMENTlink written 10 weeks ago by genomax62k

Thanks so much for your help!

I may not have been detailed enough in my initial question. My current file looks like this:

BC04    BC05    BC16    BC17    BC28    BC29    BC40    BC41    BC52    BC64    BC76    BC88

MG576168.1      0       0       0       0       0       0       0       0       0       0       1       1

AB948667.1      0       0       0       1       0       0       0       0       0       0       0       0

DQ125562.1      1       25      2       21      0       13      0       0       0       2       6       7

DQ836750.1      0       0       0       0       0       1       0       0       0       0       0       0

FN296805.1      2       1       2       5       5       5       6       3       2       4       2       2

JQ041442.1      0       0       0       0       0       0       1       0       0       0       2       2

MF112006.1      1       0       0       0       0       0       0       0       0       0       0       0

KY643688.1      0       0       0       0       0       0       0       1       0       0       0       0

...etc. for about 10,000 accession numbers. Is there a way for me to get the taxa ID using the NCBI unix tools (or whatever) to make these accession numbers taxa IDs?

ADD REPLYlink modified 10 weeks ago by genomax62k • written 10 weeks ago by mrsmith0

Looks like the data in the first column are accessions. I am not sure what the items in the first row are. First, you need to get all the accessions into a text file that looks something like this:

$ cat temp.txt
MG576168.1
AB948667.1
DQ125562.1
DQ836750.1
FN296805.1
JQ041442.1
MF112006.1
KY643688.1
$ cat temp.txt | epost -db nuccore | esummary -db nuccore | xtract -pattern DocumentSummary -element Caption,TaxId
MG576168        219572
MF112006        306
KY643688        1886637
AB948667        77133
JQ041442        77133
DQ836750        77133
DQ125562        77133
FN296805        77133

You will want to read up on Entrez Direct (the NCBI e-utils on the unix command line) if you want to do this yourself.

ADD REPLYlink written 10 weeks ago by vkkodali940

That worked really well, thanks a lot! I appreciate the help so much!

ADD REPLYlink written 10 weeks ago by mrsmith0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1158 users visited in the last hour