How to have species taxonomy appear with Taxonomy ids; Greengenes reference db
1
0
Entering edit mode
7.8 years ago

I have about 100,000 sequences and I am using blast with greengenes (the lastest version) as a reference database. Whenever I run it, I get an outcome like

Query= SWED-1-1_0 HISEQ:265:HHK2LBCXX:1:1101:3356:2270 1:N:0:ACAGCAGA orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0 Length=429 Score E Sequences producing significant alignments: (Bits) Value 4469610 787 0.0
4451440 787 0.0
714887 787 0.0
887750 787 0.0

I would like to know how to attach have the actual taxonomy appear instead of the ids. For example it would say "4469610 k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Neisseriales; f__Neisseriaceae; g__; s__." I have the greengenes taxonomy text file, I just need to know I to make it appear in the outcome. Apparently a way to do this is to add the info into the head of the fasta file such as ">idxxx taxonomy\nseq", but I have not gotten that to work nor have I found any other information on something like this. Any help is appreciated.

RNA-Seq sequencing alignment blast • 1.8k views
ADD COMMENT
2
Entering edit mode
7.8 years ago
natasha.sernova ★ 4.0k

See this post:

how to map greengenes taxonomy locally

at the bottom see the answer for taxonomy

these articles also may help:

An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3280142/

assign_taxonomy.py – Assign taxonomy to each sequence

http://qiime.org/scripts/assign_taxonomy.html

ADD COMMENT

Login before adding your answer.

Traffic: 2023 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6