Blastall Vs Blast+ Problem With Filter Dust
2
2
Entering edit mode
12.0 years ago
adam.skowron ▴ 20

Hi all,

I used to use old version of BLAST (blastall) but becuase of this problem I try to use BLAST+.

However the same query (short (15-30 bp) sequences of nucleotides) in BLAST+ (task blastn) returns much more hits than blastall... and most of them are dust or low-complexity sequences...

I'm using parameters:

"-task", "blastn",
"-db", database, // human database
"-query", file.getPath(), // file with short sequences in FASTA format
"-gapopen", "5",
"-gapextend", "2",
"-penalty", "-3",
"-reward", "2",
"-dust", "yes",
"-word_size", "7",
"-num_alignments", "100",
"-num_descriptions", "50",
"-max_target_seqs", "50",
"-evalue", "250"

The question is... Do you know how to set parameter of the dust (or other parameters) to get similar results as in old version of BLAST? The most important is to keep word_size equally to 7.

At this moment I have a really huge output file... I tried to use different values but the file is still too big...

Thanks,

Adam

blastn filter • 4.8k views
ADD COMMENT
2
Entering edit mode
12.0 years ago
Niek De Klein ★ 2.6k

You won't get the same results between blastall and BLAST+ because they changed their algorithms. One cause could be their finite size correction, although it seems to be only for protein-protein programs, but I can imagine they changed other things too.. You can read about it here, point 2). To quote: " For short queries or database sequences, it may change the expect value reported by orders of magnitude." From my own experience when changing from blastall to BLAST+ all e-values got lowered so they must have changed something in their parameters or algorithm.

If the problem is that you get too many hits, lower the evalue, maxtargetseqs or num_alignments, each of these should lower the amount of sequences you get, although this will not give similar results to blastall, and you might miss important information.

ADD COMMENT
0
Entering edit mode
12.0 years ago
adam.skowron ▴ 20

Niek thank you for you reply.

I am aware that e-value is lower, but most of the hits have value less than 100. I tried to change parameters like evalue, max_target_seqs (Is it working for nucleotides? I did not see any change.) etc. but the same parameters for blastall give me a file ~10-50MB while BLAST+ gives ~500-800MB.

I know that most of hits are low-complexity sequences (the same hit occurs multiple times) and I would like to filter them, but it seems that the parameter of dust is not working with word size 7...

For example here the eValue is set to 30 000, the target is 50 000!! and the word size is 7 and somehow their filter the low complexity regions. Do you know how?

ADD COMMENT

Login before adding your answer.

Traffic: 1927 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6