Question: incomplete (all-vs-all) blastp results
0
gravatar for a.abnousi
3.7 years ago by
a.abnousi30
a.abnousi30 wrote:

I have a fasta file with 254 sequences. I created a blast database with masking and then ran the blastp using that database and the input fasta (with masking again). (commands shown below)

But in the results there are only 203 sequences used as query (while the subjects are correct 254). The output of my tabular blastp looks like this, while I would expect the last line to be "254 ... ... ... ...":

qry id, sbj id, % identity, length, mismatches, gap_opens, q_start, q_end, s_start, s_end, evalue, bit_score

1     30     29.42  673    409     19      4       643     7       646     2e-66   242
1     30     38.26  115    71      0       781     895     645     759     1e-22   106
1     185    27.99  661    350     20      289     889     322     916     2e-59   223
2     253    28.86  648    366     20      267     895     209     780     9e-58   216
.
.
.
203     16     41.30  293    148     3      607    895     529     801     2e-57   216
203    16     29.75  511    305     13     44      542     64      532     5e-40   162

Note that query sequence #2 is matched against sequence #253, but sequence #253 is not queried at all, the last sequence being 203.

I'm not sure if I'm expecting the right thing? Shouldn't the last line be sequence #254 queried against some matching subjects? (the sequences are mostly similar it is very unlikely that 204-254 don't align with anything). Or is this the correct result that I should have? If so, can you explain what happens to #204-#254? Thanks!

Here is how I have ran my blast:

./segmasker -in my_fasta.fasta -infmt fasta -outfmt maskinfo_asn1_bin -out my_seg_output.asnb

./makeblastdb -in my_fasta.fasta -input_type fasta -dbtype prot -mask_data my_seg_output.asnb -out my_db -title my_db

./blastp -query my_fasta.fasta -out my_fasta_blasted -evalue 1.0 -dbsize $db_size -max_hsps $hsps -seg "yes" -db_hard_mask 21 -db my_db -outfmt 6
blastp • 1.4k views
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by a.abnousi30

It turned out that some time ago I asked a similar question.

A: each protein with each protein

The answers may be helpful.

ADD REPLYlink written 3.7 years ago by natasha.sernova3.7k

Thanks for your reply! I looked into that question but they are explaining how to do the all-vs-all blast, I have done that (I have additionally done masking using segmasker, which might have caused the problem!?).

ADD REPLYlink written 3.7 years ago by a.abnousi30

Look at this post:

A: How To Mask Low-Complexity Regions In Proteins?

I propose you may loose some proteins when you mask your data.

What happen when you omit masking?

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by natasha.sernova3.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 757 users visited in the last hour