Question: BWA indexer fails to generate fasta.sa file!
1
gravatar for reza.jabal
3.4 years ago by
reza.jabal330
New York, USA
reza.jabal330 wrote:

Hi every one,

I am indexing the human reference genome with BWA with following command:

bwa index -a bwtsw reference.fa

but it fails to generate rbwt, .rpac, .rsa and .sa. I was wondering if any one knows what are these files and how I can generate the .sa file?

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by reza.jabal330
2

Are there any error messages? Is this the exact command you're running?

ADD REPLYlink written 3.4 years ago by pld4.8k
1

BWA doesn't pop up any error, but I am trying to find split-reads using LUMPY an it requires fasta.sa!

[bwt_restore_sa] fail to open file human_g1k_v37.fasta.sa' : No such file or directory

ADD REPLYlink written 3.4 years ago by reza.jabal330
3

Are you sure you didn't mean to type human_g1k_v37.fasta.fa?

ADD REPLYlink written 3.4 years ago by Devon Ryan92k

Is reference.fa a plain multi-fasta format file? This is a straightforward command and should work.

Can run your command as and tell us what you see?

$ bwa index -a bwtsw reference.fa 2>&1

As @Devon points out below reference.fa has to be replaced with a real file name (unless that is what you file is called).

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by genomax71k
[bwt_gen] Finished constructing BWT in 688 iterations.
[bwa_index] 3109.53 seconds elapse.
[bwa_index] Update BWT... 15.98 sec
[bwa_index] Pack forward-only FASTA... 15.56 sec
[bwa_index] Construct SA from BWT and Occ... Killed
ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by reza.jabal330
1

What exit code does it give?

ADD REPLYlink written 3.4 years ago by pld4.8k
2

Are you using the latest bwa?

ADD REPLYlink written 3.4 years ago by genomax71k
1

I am using the bwa (v.0.7.12).

ADD REPLYlink written 3.4 years ago by reza.jabal330
3
gravatar for reza.jabal
3.4 years ago by
reza.jabal330
New York, USA
reza.jabal330 wrote:

Ok guys, it appears that it is a memory issue! I am sharing this in case anyone else encountered the same problem!

To "construct SA from BWT and Occ" is the last step in indexing. It is also the step that takes most of memory. It is possible that that node does not have enough memory and thus data keep being swapped between RAM and disk. For a 15GB reference genome, you may need around 25GB memory for this step and the subsequent mapping. If you are using LSF/SGE, please make sure you have requested enough memory.

As this is the last step, you may run: bwa bwt2sa ref.bwt ref.sa

to finish indexing, instead of running "bwa index". This step should take several hours.

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by reza.jabal330
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1519 users visited in the last hour