How to get "best" taxon identifier from diamond output with staxids?
0
0
Entering edit mode
18 months ago
O.rka ▴ 390

I recently discovered the staxids field with diamond (staxid is not a thing with diamond). I'm trying to assign taxonomy identifiers to all of my ORFs but I'm encountering many instances of when there are 2 or more (sometimes many more).

What is the recommended way for picking the "best" one? I don't want to randomly choose one, grab the first, etc. Is there a systematic way I can do this that is robust? Maybe the one that is the "most reliable"?

Here's the example output below. I can't use a regex search for [] in the stitle because not all of them have this suffix.

qseqid                                      NODE_100002_length_1286_cov_2.42892_1132_1285_-
sseqid                                                                       WP_021626941.1
pident                                                                                 94.4
length                                                                                   18
mismatch                                                                                  1
gapopen                                                                                   0
qstart                                                                                    1
qend                                                                                     18
sstart                                                                                  110
send                                                                                    127
evalue                                                                                 0.22
bitscore                                                                               44.3
staxids                                                                     1227265;1227266
sscinames    Capnocytophaga sp. oral taxon 863;Capnocytophaga sp. oral taxon 863 str. F0517
stitle              WP_021626941.1 hypothetical protein [Capnocytophaga sp. oral taxon 863]
Name: 6422120, dtype: object
metagenomics blast diamond alignment protein • 405 views
ADD COMMENT
0
Entering edit mode

What do you mean by best one? Both are pointing to the same genus. If you using a 18 AA long hit it is likely not enough to give you an absolute confidence.

ADD REPLY
0
Entering edit mode

That's a good point! I hadn't realized this is one of the shorter ORF calls. So in this case, would it have mapped equally well to 1227265 and 1227266?

ADD REPLY

Login before adding your answer.

Traffic: 2236 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6