Question: Running BWA parallel on 1000 samples
gravatar for ebrown1955
3.9 years ago by
United States
ebrown1955300 wrote:

So I have 1000 samples (about 2000 fastq files) that I need to do "bwa mem" on. I'm trying to do this is the most time-efficient way possible. Is there a way to parallelize this command?

I figure that I can use GNU/Parallel, however what do I do with the header definition (i.e. -R)

For example, my current command reads:

parallell bwa mem -R "@RG\tID:{}\tPL:ILLUMINA\tLB:lib1" $REF {}_1.fastq {}_2.fastq > {}.sam

Does this make sense?

bwa alignment • 2.1k views
ADD COMMENTlink modified 3.9 years ago by Pierre Lindenbaum122k • written 3.9 years ago by ebrown1955300

What arguments are you supplying to parallel? Sample names/IDs, full files, what? Parallel should be able to add the value that you want to the RG header. What is the exact issue you're having?

Karl is correct, there is a trade off between parallelization and IO. Run too many at once and you'll clog up the IO and the overall runtime will be equal to that of a smaller number of simultaneous jobs.

ADD REPLYlink written 3.9 years ago by pld4.8k

Equal is not the lower bound. It can get much slower than sequential operation if the operating system tries to run them all together and has to swap out memory, or load/unload/reload data as different subtasks take over the limited cpu resources. 

ADD REPLYlink written 3.8 years ago by karl.stamm3.5k

Yep, you can slow it down to worse than serial.

ADD REPLYlink written 3.8 years ago by pld4.8k

I was hoping to pass sample IDs into parallel. For example, if an id is XX0001, then the fwd fastq is called XX0001_1.fastq and the rev is XX0001_2.fastq.

I think there would be some variant of parallel bwa mem -R "@RG\tID:{}\tPL:ILLUMINA\tLB:lib1" $REF {}_1.fastq {}_2.fastq > {}.sam

What command would actually accomplish this. Should I create a listfile with the sample IDs?

ADD REPLYlink written 3.8 years ago by ebrown1955300

You should check out the documentation on GNU parallel some more, you have to feed parallel something to use and there are tons of ways you can do that. You should try out some of the examples and play around seeing how it behaves using echo.

You could do something like this (although parsing ls is bad):

Say my files are in the format of (sampleID1_1.fq, sampleID1_2.fq) and so on:

parallel -j 4 bwa mem -t 3 -R "@RG\tID:{}\tPL:ILLUMINA\tLB:lib1" <ref genome> /input/dir/{}_1.fastq /input/dir/{}_2.fastq ">" /output/dir/{}.sam ::: `ls /input/dir/*.fastq | cut -d '_' -f1 | uniq`

Where input and output are wherever your input and outputs are/go. Again, tune -j and -t to whatever your system uses. Unless you're running these on a large number of nodes (which won't work with the above example), I wouldn't run 1000 of them at a time.

ADD REPLYlink written 3.8 years ago by pld4.8k
gravatar for Pierre Lindenbaum
3.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:

I use a make or SGE+qmake

see also: "generating workflows with XML and XSLT"

ADD COMMENTlink written 3.9 years ago by Pierre Lindenbaum122k

Could you share an example qmake script for a simple bwa/samtools alignment?

ADD REPLYlink written 3.8 years ago by dfornika1000
gravatar for karl.stamm
3.9 years ago by
United States
karl.stamm3.5k wrote:

For my computer, the bwa's built in parallelism (-t argument) can use up all the CPU and become bottlenecked by harddrive speed. How fast it can read in and write out files, or it will have used up all the CPU on a single sample.


That means you'll get as good or better speed by doing each sample sequentially. 

Just use a bash for loop to run bwa mem with -t set to your cpu count. GNU Parallel can do something like that with the --jobs parameter to limit the number of simultaneous tasks.


If you have multiple compute nodes on a grid, then my answer is wrong and you should distribute the workload.

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by karl.stamm3.5k

-t limits the number of cores each BWA instance can use. -j (parallel) limits the number of simultaneous instances.

So parallel -j 4 bwa mem -t 4 {etc} would be 4 BWA instances each using 4 cores, for a total of 16 cores.

ADD REPLYlink written 3.9 years ago by pld4.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1832 users visited in the last hour