Confused with local blast output -please help
Entering edit mode
4.9 years ago

Hi all ,

I have performed a local protein blast and it gave me an output which contains a lot of duplication . However, I sorted out the result w.r.t - e-value( <=0.01) , -query coverage (>= 70%), -%id (>= 30%) , -higher bit score

Now, coming to my Problem-

  • I'm seeing that the query_Start_position for multiple sequences are different (for eg. 158 and 178) but the End_position is same (let's say 840) and in many cases it's vice versa.

How do I sort it out? Which one to consider ? And if one hit shows query sequence position from 5 - 100 and other shows 10 - 90 (which is in between 5 - 100 ) , which one should I consider?

Thanks in advance. Wang

blast • 1.1k views
Entering edit mode

I assume you are referring to high scoring segment pairs (HSP) which are defined as

The fundamental unit of BLAST algorithm output is the High- scoring Segment Pair (HSP). An HSP consists of two sequence fragments of arbitrary but equal length whose alignment is locally maximal and for which the alignment score meets or exceeds a threshold or cutoff score.

It is unclear what exactly you are trying to do but this is how blast is supposed to work.

Entering edit mode

Hey, thanks for responding. I'm a Beginner with BLAST, please help me out.I'll try to explain my problem again. I have done a blastp for a proteome file against the virulence factor database, to find out what all virulence genes are present in my fasta file .The query sequences are aligned with the subject sequences with varied e-value, query coverage and bit score .And, I have sorted out the result keeping only those with the e-value(<=0.01), query coverage (>= 70%) and higher bit score.

Now ,after doing all this, I see many alignments with same query_start_position but differs in end_positions. for eg:-

  1. first alignment- query seq position = 1 and end position = 110
  2. second alignment - query seq position = 1 and end position = 130.

Now, Out of these two, Which one I should keep and which one should be discarded?

Entering edit mode

Are those two alignments going to the same "subject"? If so you would "keep" the subject rather than the specific HSP. If the two HSP are from two different "subjects" then you would need to consider both. Apologies if this sounds vague but I am not exactly certain about the final result you want.

Entering edit mode

ok so...

can i remove duplicated subject sequences present in the subject file (i.e database) before feeding it to blast .. ?

I'm performing a blast against the database for searching virulence genes , it gave me an output with 5000 alignments. Now, other than e-value, query coverage and bit score and deleting the duplicated alignments , I still get some 400 alignments , and most of it shows the same query sequence aligning with different subject sequences. and in few cases same subject sequence aligned with different query sequences. Now, how should I sort the output, hereafter.

If I use the online tool, it shows very few genes present in my sequence (around 10 to 20 genes). so how to narrow down my output with the most appropriate alignments.


Login before adding your answer.

Traffic: 2332 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6