best BLAST/alignment hits - which criteria is more important?
1
0
Entering edit mode
3 months ago
min • 0

I ran a search using mmseqs2 and diamond and I'm now trying to find the best hits. I need to sort hits by e-value, bit score, Percentage of identical matches, ... but I'm unsure of how to prioritize these factors. which criteria is more important?

blast • 478 views
ADD COMMENT
0
Entering edit mode
3 months ago
Mensur Dlakic ★ 26k

There is no way to run BLAST using mmseqs2 and diamond. Maybe you mean that you compared sequences using mmseqs2 and diamond?

I need to sort hits by e-value, bit score, Percentage of identical matches, ... but I'm unsure of how to prioritize these factors. which criteria is more important?

Impossible to answer because we don't know what your goal is. If your goal is to find all related sequences, e-values < 1e-5 pretty much guarantee some kind of relationship. If you want most related sequences, percent identity and coverage are relevant. Bit-scores don't help much as they are length-dependent - longer sequences will have higher bit-scores even at lower sequence identity. E-values smaller than ~1e-330 become zeros, so sorting by e-values won't help for highly significant matches.

I suggest you present a better description of your intentions and maybe we will be able to help.

ADD COMMENT
0
Entering edit mode

thank you for correcting me. What I actually did was run a search using mmseqs2 and diamond to find similar enzymes in a MAG for my reference enzymes. I got multiple hits for each reference and now I'm trying to narrow down my search by setting a threshold for e-value. I wanted to find best hits and find overlaps and plot e-values. most of the hits have very low e-values but I'm not sure about using e-value as the only criteria since some hits may have low alignment length or other factors. so I wanted to know if there is any order in which hits should be sorted based on. like should I sort by 1. e-value 2. length of alignment, ... .

ADD REPLY
1
Entering edit mode

If your goal is to find the most similar sequence to your references, you have a correct sorting order. The next two criteria should probably be percent identity and bit-scores.

ADD REPLY

Login before adding your answer.

Traffic: 953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6