Question: bwa mem runs slowly the first time
0
gravatar for Hatem Elshazly
4.3 years ago by
Egypt
Hatem Elshazly60 wrote:

Hi there,

I'm using bwa mem for alignment but I noticed this behavior which I don't fully understand:

I created large aws ec2 instances (60G RAM and 32 cores), installed bwa, downloaded the human reference (=3G), indexed it, the bwa command is very straightforward: bwa mem human_ref.fasta input.fq 

What happens is that the first time I use the command it takes a long time. It doesn't output anything onto stdout or stderr but I noticed it is loading something in the RAM (I think its the reference index), after it loads 5G or so, bwa "runs" fast enough with respect to the small input size (in megas). This Scenario only happens the first time I run the command on the machine, any runs after that don't take such time and finish reasonably fast.

Is this is normal? Why doesn't bwa take such long time after the first run?

Any help is appreciated.

Thanks,
Shazly

bwa slow ec2 alignment mem • 2.9k views
ADD COMMENTlink modified 4.3 years ago by dariober10k • written 4.3 years ago by Hatem Elshazly60
3

I think you are right about the loading of the reference index into memory. Also, I seem to recall that the system bwa uses for  memory mapping of the index allows it to be reused in subsequent runs. That's why things get faster after the first run. If you load your memory with something else between two runs, your second run should be slow, too.

ADD REPLYlink written 4.3 years ago by thackl2.7k
0
gravatar for donfreed
4.3 years ago by
donfreed1.4k
Mountain View, CA
donfreed1.4k wrote:

Before aligning reads, bwa must generate an index file (an FMD-index of the reference genome). The first time you run the command, the index is generated but subsequent runs can use the previously generated index file.

ADD COMMENTlink written 4.3 years ago by donfreed1.4k

Thanks for the reply but Is this file saved in a tmp directory or something? I didn't find neither in the reference directory or the working directory.

ADD REPLYlink written 4.3 years ago by Hatem Elshazly60

The index files should be in the same directory as the reference. They should have the same base as the reference, but should also have additional extensions.

For example, if your genome is human_g1k_v37.fasta. Bwa will generate human_g1k_v37.fasta.bwt and additional files.

ADD REPLYlink written 4.3 years ago by donfreed1.4k

Actually I incorrectly assumed BWA would generate the index files automatically if they are not present. I just checked and it will not, so I have no idea why BWA would run more slowly for the first run and more quickly on subsequent runs.

ADD REPLYlink written 4.3 years ago by donfreed1.4k
0
gravatar for dariober
4.3 years ago by
dariober10k
WCIP | Glasgow | UK
dariober10k wrote:

HI- I seem to confirm what the OP refers to and what @thackl suggests in his/her comment:

Align a dummy sequence file with one read to mouse reference genome:

# First run:
time bwa mem /lustre/.../Mus_musculus_NCBI_v37/mmu.fa test.fa
...
real    0m8.466s
user    0m0.127s
sys    0m6.410s

# Second run
time bwa mem /lustre/.../Mus_musculus_NCBI_v37/mmu.fa test.fa
...
real    0m2.282s
user    0m0.116s
sys    0m2.129s

# Third run:
time bwa mem /lustre/.../Mus_musculus_NCBI_v37/mmu.fa test.fa
...
real    0m2.169s
user    0m0.100s
sys    0m2.041s

I tried on a couple of different nodes and the picture stays the same: first run ~4x slower then following runs.

ADD COMMENTlink written 4.3 years ago by dariober10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 797 users visited in the last hour