BWA indexer fails to generate fasta.sa file!
1
2
Entering edit mode
5.1 years ago
reza.jabal ▴ 460

Hi every one,

I am indexing the human reference genome with BWA with following command:

bwa index -a bwtsw reference.fa

but it fails to generate rbwt, .rpac, .rsa and .sa. I was wondering if any one knows what are these files and how I can generate the .sa file?

sequencing alignment software error • 8.1k views
ADD COMMENT
2
Entering edit mode

Are there any error messages? Is this the exact command you're running?

ADD REPLY
1
Entering edit mode

BWA doesn't pop up any error, but I am trying to find split-reads using LUMPY an it requires fasta.sa!

[bwt_restore_sa] fail to open file human_g1k_v37.fasta.sa' : No such file or directory

ADD REPLY
3
Entering edit mode

Are you sure you didn't mean to type human_g1k_v37.fasta.fa?

ADD REPLY
0
Entering edit mode

Is reference.fa a plain multi-fasta format file? This is a straightforward command and should work.

Can run your command as and tell us what you see?

$ bwa index -a bwtsw reference.fa 2>&1

As @Devon points out below reference.fa has to be replaced with a real file name (unless that is what you file is called).

ADD REPLY
0
Entering edit mode
[bwt_gen] Finished constructing BWT in 688 iterations.
[bwa_index] 3109.53 seconds elapse.
[bwa_index] Update BWT... 15.98 sec
[bwa_index] Pack forward-only FASTA... 15.56 sec
[bwa_index] Construct SA from BWT and Occ... Killed
ADD REPLY
1
Entering edit mode

What exit code does it give?

ADD REPLY
2
Entering edit mode

Are you using the latest bwa?

ADD REPLY
1
Entering edit mode

I am using the bwa (v.0.7.12).

ADD REPLY
5
Entering edit mode
5.1 years ago
reza.jabal ▴ 460

Ok guys, it appears that it is a memory issue! I am sharing this in case anyone else encountered the same problem!

To "construct SA from BWT and Occ" is the last step in indexing. It is also the step that takes most of memory. It is possible that that node does not have enough memory and thus data keep being swapped between RAM and disk. For a 15GB reference genome, you may need around 25GB memory for this step and the subsequent mapping. If you are using LSF/SGE, please make sure you have requested enough memory.

As this is the last step, you may run: bwa bwt2sa ref.bwt ref.sa

to finish indexing, instead of running "bwa index". This step should take several hours.

ADD COMMENT

Login before adding your answer.

Traffic: 1681 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6