BWA indexer fails to generate fasta.sa file!
1
2
Entering edit mode
5.1 years ago
reza.jabal ▴ 460

Hi every one,

I am indexing the human reference genome with BWA with following command:

bwa index -a bwtsw reference.fa

but it fails to generate rbwt, .rpac, .rsa and .sa. I was wondering if any one knows what are these files and how I can generate the .sa file?

sequencing alignment software error • 8.1k views
2
Entering edit mode

Are there any error messages? Is this the exact command you're running?

1
Entering edit mode

BWA doesn't pop up any error, but I am trying to find split-reads using LUMPY an it requires fasta.sa!

[bwt_restore_sa] fail to open file human_g1k_v37.fasta.sa' : No such file or directory

3
Entering edit mode

Are you sure you didn't mean to type human_g1k_v37.fasta.fa?

0
Entering edit mode

Is reference.fa a plain multi-fasta format file? This is a straightforward command and should work.

Can run your command as and tell us what you see?

\$ bwa index -a bwtsw reference.fa 2>&1


As @Devon points out below reference.fa has to be replaced with a real file name (unless that is what you file is called).

0
Entering edit mode
[bwt_gen] Finished constructing BWT in 688 iterations.
[bwa_index] 3109.53 seconds elapse.
[bwa_index] Update BWT... 15.98 sec
[bwa_index] Pack forward-only FASTA... 15.56 sec
[bwa_index] Construct SA from BWT and Occ... Killed

1
Entering edit mode

What exit code does it give?

2
Entering edit mode

Are you using the latest bwa?

1
Entering edit mode

I am using the bwa (v.0.7.12).

5
Entering edit mode
5.1 years ago
reza.jabal ▴ 460

Ok guys, it appears that it is a memory issue! I am sharing this in case anyone else encountered the same problem!

To "construct SA from BWT and Occ" is the last step in indexing. It is also the step that takes most of memory. It is possible that that node does not have enough memory and thus data keep being swapped between RAM and disk. For a 15GB reference genome, you may need around 25GB memory for this step and the subsequent mapping. If you are using LSF/SGE, please make sure you have requested enough memory.

As this is the last step, you may run: bwa bwt2sa ref.bwt ref.sa

to finish indexing, instead of running "bwa index". This step should take several hours.