Not getting all hits in blast+ all. vs. all
1
0
Entering edit mode
8.7 years ago
bioPiraten ▴ 10

I realize there are multiple threads and extensive documentation about blast+, but after traversing most of that information, I haven't been able to find the answer to my problem.

I am making an all vs. all blast+ search on private makedb database.

I got 529 proteins in my DB, which I make by:

makeblastdb -in C_domains.faa -dbtype 'prot' -out Cdomains_DB

Then I blast against it:

blastp -query C_domains.faa -db Cdomains_DB -num_threads 16 -out C.blast -max_target_seqs 529

When I parse the result I get a varying number of alignments/sbjcts hits for each query sequence varying from 529 - 400. I want to know all the alignment scores i.e. 529 alignments each time.

I tried to set -max_target_seqs 10000

but it didn't change a thing

... As a side note, I've tried the same thing for another database with 300 proteins, and it returned 300 hits each time...

blast-plus • 2.4k views
ADD COMMENT
1
Entering edit mode
8.7 years ago
Michael 54k

Blast didn't find alignments for those sequences that were not aligned, because they are too dissimilar.

To increase the sensitivity, you can reduce the -word_size parameter to 2, and to filter out less results you can set the -evalue parameter to 1000 or so. As blast is a heuristic it might still not find an alignment for each entry. If you simply need an alignment of everything, irrespective of how bad, you should use full smith-waterman using e.g. Ssearch with very high -evalue cutoff.

Btw what do you need that for, I have the feeling, there might be a more appropriate method to achieve whatever you are trying (creating some sort of 'distance matrix'?, or multiple alignments?)

ADD COMMENT
0
Entering edit mode

Thanks, I will try that. I am analysing a specific protein domain, which I have extracted from 30 organisms and now I want to look at the similarity distribution, and infer phylogenetic relationships. I figured blast was the most feasible way of getting % identity for all vs. all of 529 sequences with avg. length ~300 AAs.

ADD REPLY
0
Entering edit mode

Wouldn't it be better to directly do a standard multiple alignment and phylogeny? Maybe a bit too many proteins for some msa to handle?

ADD REPLY

Login before adding your answer.

Traffic: 3085 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6