tblastn BLAST+ segmentation fault
1
0
Entering edit mode
3 months ago
bl02015 • 0

Dear colleagues,

I keep getting a segmentation fault message whenever trying to run a tblastn command:

tblastn -db genome.fna -query protein.fas -out protein.out -num_threads 14 -outfmt 7.


I have done the following so far: I downloaded the latest linux version of BLAST+ from here https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ , compiled and installed the program. Afterwards I created a database from a somewhat large genome (over 15 gbp). The database consisted of two files as it was larger than 4GB.

makeblastdb -in genome.fna -parse_seqids -blastdb_version 5 -title "genome" -dbtype nucl -max_file_sz 4GB


I then tried to use tblastn with the following command

tblastn -db genome.fna -query protein.fas -out protein.out -num_threads 14 -outfmt 7


, which resulted in the aforementioned segmentation fault. (blastn runs normally) tblastn works fine online, when searching for the same protein and same genome used in the above command line.

The computer on which the commands are running on has a 128GB RAM and 14 core processor, so I doubt the hardware is to blame.

I wonder what the cause of this error could be.

tblastn segmentation_fault blast_plus • 759 views
0
Entering edit mode

are you sure the database to blast to is called 'genome.fna' ? In any case you will only need to provide the prefix name of the blastDB .

0
Entering edit mode

blastn doesn't run with the prefix name. I have to provide the name of the file from which the database was created in order for it to run. That is not the issue. When given the name of the database it gives the following message

BLAST Database error: No alias or index file found for nucleotide database [genome] in search path [/data/username::]

0
Entering edit mode

this is related to the comment of GenoMax below.

it can very well be that your DB is called "genome.fna" (I would personally try to avoid it but ok)

Can it be you're over-asking the required resources? (btw, running it on 14 threads will not increase much, there is a known plateau for the # threads in blast, and only parts of the while procedure are mutlithreaded)

How much memory do you have available on the machine you run this on? (keep in mind that blastn, well mega-blastn by default) will use much less resources than translated blasts.

0
Entering edit mode

The database consisted of two files as it was larger than 4GB.

What does that mean? Can you show us a listing of ls -l genome*.

0
Entering edit mode

here is the list:

Genome.fna.00.nhr

Genome.fna.00.nin

Genome.fna.00.nog

Genome.fna.00.nsq

Genome.fna.01.nhr

Genome.fna.01.nin

Genome.fna.01.nog

Genome.fna.01.nsq

Genome.fna.nal

Genome.fna.ndb

Genome.fna.nos

Genome.fna.not

Genome.fna.ntf

Genome.fna.nto

0
Entering edit mode

ok, so indeed your blast DB is called "Genome.fna" (with upper case G instead of lowercase as in your cdmline) .

0
Entering edit mode

It's upper case because its the beginning of the line...Sorry for that

0
Entering edit mode
3 months ago
bl02015 • 0

Turns out it was a size issue. tblastn cannot run for a sequence larger than 1,073,741,821 (2^30)-3 bp. If I add another one it gives a segmentation fault.

0
Entering edit mode

Did you hear that from blast support or you found that out by experimentation? BTW: Curious as to why you have a sequence that long?

0
Entering edit mode

It's a whole genome. The data base is that big not the query sequence.

0
Entering edit mode

I figured it out. If I split the genome to smaller parts and create a data base for each part, tblastn runs normally.

0
Entering edit mode

Is there any program that can run for a sequence 1,073,741,821 (2^30)-3 bp long?

0
Entering edit mode

you have a protein that long ????? :scream:

or do you mean an entry in your database? If so then it should have given an error/warning at database creation time (as well) I think

0
Entering edit mode

The genome is that long, so the data base is that long. If I split the genome to smaller parts and create a data base from each part, it runs. So the problem must be the genome size I believe.

0
Entering edit mode

can't be , I have databases that are much much bigger in (total) size. That is of course all sequences combined not a single sequence. I know that for single sequences that are very long you can run into trouble indeed.

so one of your sequences (chromosome) is <1Gb long, that is possible indeed.