Question: Blastn Segmentation Fault
1
gravatar for Srihari
8.2 years ago by
Srihari30
Srihari30 wrote:

Hi,

I am running a BLASTN of about 150 sequences against a genome that is 2.2 gigabases long. A few of my queries are actually full length BAC end sequences running to around 150,000 bases. I expect to find huge, contiguous hits for some BACs in the genome. Here's the command I use -

blastn 80BACs.fasta -db mygenome -out 80BACsBLAST -outfmt 10 -num_threads 8 -evalue 10e-3 -index_name mygenomeMBI

Around 10 minutes after it starts running, the program halts after producing a segmentation fault. I did a 'ulimit -s unlimited' to set the stack size to unlimited, but to no avail. I also went easy on the number of threads in subsequent trials, setting num_threads to 5 and subsequently, 2 - but that didn't help either.

I am using the binaries from rmblast-1.2-ncbi-blast-2.2.23+. I had earlier run a smaller query dataset against the same genome which worked fine, the BLAST completed in half a day. This issue, I am convinced is most definitely due to some very very long query sequences - I'd highly appreciate any help in this regard!

Thanks,

Srihari

blast • 6.3k views
ADD COMMENTlink modified 4.1 years ago by Biostar ♦♦ 20 • written 8.2 years ago by Srihari30
2
gravatar for Lee Katz
8.2 years ago by
Lee Katz3.0k
Atlanta, GA
Lee Katz3.0k wrote:

For such a large query, you might want to try a different tool. Consider what you actually want to do. Mummer might be good if you want to find the region that your query matches with.
If you are searching for homology with genes, consider breaking your query into individual genes before using blastn.

If you are still sure that you want to query with 150k against a 2.2 Gb genome using blastn, then you can try certain tricks like increasing the word size which will reduce your sensitivity (put it up to 28 or even up to 50ish). I forget which way might be better for you in terms of filtering, but switching filtering on or off might help you too.

ADD COMMENTlink written 8.2 years ago by Lee Katz3.0k
2
gravatar for Michael Dondrup
8.2 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

I agree with Lee that maybe blast is not the right tool for such a long query sequence. Maybe for this purpose MUMmer is the better choice. Your description sournds more like a global alignment problem.

Otherwise:

The blast+ programs are regularly updated and bugs get fixed, so the first thing you should do is to install the latest version, otherwise it is possible that you are running into a bug that is already fixed. The latest version is 2.2.15 atm and available here: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

Install a 64bit binary, that is very important too.

After that, if the error persists: I have just experienced myself that there still exist certain (even short) input sequences that can simply crash blast. If possible, try to isolate the input sequence that causes the crash by feeding parts of the input file only, or feeding a single sequence at a time. If that doesn't help, try changing the -outfmt switch with a different output format.

ADD COMMENTlink modified 5 months ago by RamRS25k • written 8.2 years ago by Michael Dondrup47k
1
gravatar for Roman Valls Guimerà
8.2 years ago by
Melbourne
Roman Valls Guimerà530 wrote:

Hello unknown,

Since you've set the ulimit to unlimeted, a "core" file should have been generated (Segfault, core dumped). Therefore, you can further debug what has happened by running "gdb -c core_file".

gdb> bt

This should give you a backtrace on the last functions that were called (if there are empty or "??" function names, you should compile blastn yourself with -debug symbols).

Alternatively, you can run "strace" before the command: "strace blastn 80BACs.fasta ..." and have a look at the last 100 system calls to figure out if something has gone wrong with memory management.

Hope that helps !

Roman

ADD COMMENTlink written 8.2 years ago by Roman Valls Guimerà530
0
gravatar for earonesty
6.7 years ago by
earonesty230
United States
earonesty230 wrote:

Try without -num_threads ... that can crash blastn. Fortunately, it's pretty easy to split the input file into chunks, then run blast, then assemble the outputs. This is the only reliable way to "multithread" blast right now.

ADD COMMENTlink written 6.7 years ago by earonesty230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1468 users visited in the last hour