Hi all,
I have a set of 2 compressed paired files like:
L1_R1.fastq.bz2
L1_R2.fastq.bz2
and I want to run bwa mem
with multiple threads on it. As the files are compress in .bzip2
I am using a shell script to pipe two commands for compressing both into bwa mem
at once. The code is as follows:
bwa mem -t20 /projects/ref.fa \
<(pbzip2 -kdc -m5000 -p12 /projects/L1_R1.fastq.bz2) \
<(pbzip2 -kdc -m5000 -p12 /projects/L1_R2.fastq.bz2) \
| samtools view -u -F4 | samtools sort -@8 -o
/projects/linearL1.bam
I am using pbzip2 which do the decompression using multiple cores (-p12
). Additionally, I set the -t20
parameter for bwa mem
. But when I run the script, I see that only two threads are processing the data! But I want to use multiple threads to do it faster..!
So the question is What am I missing in my script to use more threads? or what is the problem?
Additionally, I have multiple files, I am wondering if I can input all of them into a single bwa mem
run. Example of my files organisation:
L1_R1.fastq.bzip2
L1_R2.fastq.bzip2
L2_R1.fastq.bzip2
L2_R2.fastq.bzip2
L3_R1.fastq.bzip2
L3_R2.fastq.bzip2
At some point IO become limiting. Anyway, if
bwa mem
is using all 20 cores then decompressing the files faster won't make any difference. Further, bwa, like pretty much every program in existence, has various steps with various levels of parallelization, so if you're looking at its CPU usage you might just be seeing a step where its worker threads are dumping to disk (that will be single threaded by nature).But it is not using all the 20 cores! yeah I am looking at CPU usage but it is totally different from when you pass an uncompressed file into it...
Then something else is the bottleneck. Programs don't scale linearly forever, there are various limitations throughout both your system and program architectures.
I think you might be better off trying something like this:
unzip the two files each with half of your machines CPU power and only afterwards send those two unzipped files to
bwa
with all cores. That way you use the full capacity of your machine and avoid (what I think is) the rate limiting unzip step in your stream approach.Yes, you'll have to (temporarily) give in on storage space efficiency.
on the second part of your question: And why would you want to put all those files into a single bwa run ? if you split them up it will process faster.
Thank, but I am wondering what is wrong with my code! If there is no problem then the code you provided will be slower than mine, as you are doing both the things separately. Theoretically, if you use piping you are not going to be slower at least...! And my problem is that I want to be faster and memory efficient using piping. And thanks for your answer to the 2nd part. You're right.
I don't think something is wrong with your code as it does seem to work, right? As Devon Ryan also mentioned you're likely facing a bottleneck somewhere, but hard to say where or what I'm afraid.
This holds true for single core processes but not sure how this will go in your set-up. I can think of a scenario where for instance the data is not being unzipped rapidly enough for bwa to read it at full capacity, hence the whole process will go slower. In your specific case you are also over-asking the #cores to use. Your server has 20 cores and you ask for 20+12+12 = 44 in total so you might have issues with parts of your pipeline competing for resources.
I understand, but that's not really the way to go then, piped cmdlines are always more memory intensive then reading from disk as the whole cmdline needs to be handled in memory, so you will give in on memory efficiency. Yes it might go faster as you are indeed eliminating the time-consuming reading / writing to disk steps.
Might look so, but I beg to differ. The approach I provide is using the full capacity of your server for the whole duration of the process(es). And I'm not doing all steps separately only the unzipping is two steps. bwa is running at full CPU capacity so there is no loss of efficiency there (even gain as it can read from disk at full capacity)
The buffer size of a pipe is relatively small, at least on Linux, so there shouldn't be too much of a memory hit from piping.
Thanks for your COMPLETE reply, yeah I was answering fast and I missed some of the points. Thanks for pointing out. and I have to add that the machine has more than 44 cores. Thanks again.