Entering edit mode
6.3 years ago
namgyalwangstuklama ▴ 10
Hi all ,
I have performed a local protein blast and it gave me an output which contains a lot of duplication . However, I sorted out the result w.r.t - e-value( <=0.01) , -query coverage (>= 70%), -%id (>= 30%) , -higher bit score
Now, coming to my Problem-
- I'm seeing that the query_Start_position for multiple sequences are different (for eg. 158 and 178) but the End_position is same (let's say 840) and in many cases it's vice versa.
How do I sort it out? Which one to consider ? And if one hit shows query sequence position from 5 - 100 and other shows 10 - 90 (which is in between 5 - 100 ) , which one should I consider?
Thanks in advance. Wang
I assume you are referring to high scoring segment pairs (HSP) which are defined as
It is unclear what exactly you are trying to do but this is how blast is supposed to work.
Hey, thanks for responding. I'm a Beginner with BLAST, please help me out.I'll try to explain my problem again. I have done a blastp for a proteome file against the virulence factor database, to find out what all virulence genes are present in my fasta file .The query sequences are aligned with the subject sequences with varied e-value, query coverage and bit score .And, I have sorted out the result keeping only those with the e-value(<=0.01), query coverage (>= 70%) and higher bit score.
Now ,after doing all this, I see many alignments with same query_start_position but differs in end_positions. for eg:-
Now, Out of these two, Which one I should keep and which one should be discarded?
Are those two alignments going to the same "subject"? If so you would "keep" the subject rather than the specific HSP. If the two HSP are from two different "subjects" then you would need to consider both. Apologies if this sounds vague but I am not exactly certain about the final result you want.
can i remove duplicated subject sequences present in the subject file (i.e database) before feeding it to blast .. ?
I'm performing a blast against the database for searching virulence genes , it gave me an output with 5000 alignments. Now, other than e-value, query coverage and bit score and deleting the duplicated alignments , I still get some 400 alignments , and most of it shows the same query sequence aligning with different subject sequences. and in few cases same subject sequence aligned with different query sequences. Now, how should I sort the output, hereafter.
If I use the online tool, it shows very few genes present in my sequence (around 10 to 20 genes). so how to narrow down my output with the most appropriate alignments.