Question

Clarification For Running Bwa In Parallel‏

6

Entering edit mode

13.2 years ago

Mohamed ▴ 50

Hi,

I tried running BWA in parallel by splitting the input FASTQ file into 'n' splits and ran single ended alignment (aln and samse step) in parallel using 'n' bwa processes. I finally merged the SAM file output from all the processes.

When I compare the merged SAM file output with the SAM file generated by running BWA on the entire FASTQ, I find that considerable number of reads differ in their alignment ( in all the splits except the first one). I was assuming that the alignment is independent per read and splitting the FASTQ file and running BWA in parallel should work.

Could someone advise me if its correct to run BWA in the above fashion ?

Thanks, Nabeel

bwa • 12k views

ADD COMMENT • link updated 13.2 years ago by Sukhi Singh 11k • written 13.2 years ago by Mohamed ▴ 50

score 5 · Answer 1 · 2012-05-04

5

Entering edit mode

13.2 years ago

lh3 33k

BWA randomly places repetitive reads. There is no way to enforce bwa to give the same output across files. You do not need to worry about that at all.

ADD COMMENT • link 13.2 years ago by lh3 33k

1

Entering edit mode

That behavior is 100% expected. Did you try blasting that first sequence? Did you notice how it aligns equally well to many sites? The "XT:A:R" tells you that the sequence matches many sites qually well. bwa just picks one to assign it to randomly. So run it twice, different reads will be randomly assigned to different places. I bet all of the reads that exhibit this behavior have the XT:A:R tag.

ADD REPLY • link 13.2 years ago by swbarnes2 15k

0

Entering edit mode

Thank you for the reply.. Yes, you are right.. All the reads have XT:A:R tag.

If I run the alignment multiple times I get the same output, i.e. the reads are assigned to the same position again and again. It is not giving random positions each time.

This alignment difference occurs only between the merged (from the split fastq files) scenario and the complete output ( from the entire fastq file). Even if bwa randomly assigns positions to the reads, it does that consistently which I verified by running both the scenarios multiple times.

ADD REPLY • link 13.2 years ago by Mohamed ▴ 50

1

Entering edit mode

You know, for generating random numbers, you need a seed. BWA always uses the same seed and thus the sequence of random numbers is always the same, but not when you run on two separate files.

ADD REPLY • link 13.2 years ago by lh3 33k

0

Entering edit mode

Now I understand.. Thanks for the explanation. This is my final question..

Can I safely neglect the difference in alignment output as it is not a correctness issue ? As pointed out earlier by swbarnes2, it is just that the reads align equally well to different locations and one of them is chosen in random.

So, is it valid and acceptable to divide the input fastq into multiple splits and run different BWA processes in parallel and finally merge the output ?

Thanks again for all your help..

ADD REPLY • link 13.2 years ago by Mohamed ▴ 50

0

Entering edit mode

Yes, that is acceptable. However, since there is no way to tell where the reads that map to multiple locations actually came from, there will be ambiguity in placing those reads at the position BWA has mapped to in that particular run. This is also why most of the analysts prefer to work on unique reads.

ADD REPLY • link 13.2 years ago by Arun 2.4k

0

Entering edit mode

Thanks Arun for the explanation..

ADD REPLY • link 13.2 years ago by Mohamed ▴ 50

0

Entering edit mode

I am not sure if I understand what you refer as "repetitive reads". Am attempting a single ended alignment and I get different alignment output as below:

Sample Read1:

SRR062634.1918792 0 1 184972930 0 100M * 0 0 CCATTACATAATGGTAAAGGGATCAATTCAACAAGAAGAGCTAACTATCCTAAATATATATGCACCCAATACAGGAGCACCCAGATTCATAAAGCAAGTC GGEGGGGGGFGEGGGGGGFA?EEEEGGDDDGGFGFGGGGBFGGGGGGGEGGGFDEGGGGEGF=EG?EGGEGBGEEEEEEEEEEBE@AC@?C=CBACCA;@ XT:A:R NM:i:0 X0:i:1056 XM:i:0 XO:i:0 XG:i:0 MD:Z:100

SRR062634.1918792 0 8 47856174 0 100M * 0 0 CCATTACATAATGGTAAAGGGATCAATTCAACAAGAAGAGCTAACTATCCTAAATATATATGCACCCAATACAGGAGCACCCAGATTCATAAAGCAAGTC GGEGGGGGGFGEGGGGGGFA?EEEEGGDDDGGFGFGGGGBFGGGGGGGEGGGFDEGGGGEGF=EG?EGGEGBGEEEEEEEEEEBE@AC@?C=CBACCA;@ XT:A:R NM:i:0 X0:i:1056 XM:i:0 XO:i:0 XG:i:0 MD:Z:100

Sample Read 2

SRR062634.1918838 0 GL000226.1 14792 0 100M * 0 0 CAGAAACTTCTTTGTGATGTGTGCATTCAAGTCACAGAGTTGAACATTCCCTTTCGTACAGCAGTTTTTAAACACTCTTTCTGTAGTATCTGGAAGTGAA DDDDDDDDDDDDDDCDDDDDDDDDDDDDDDDCDDDCDBDBDCBBB@CBBBBBBBCBBBBBBBABB@BBBBBB=BB=BBBBABBB?BAABABB??AB?BA? XT:A:R NM:i:1 X0:i:8 X1:i:3XM:i:1 XO:i:0 XG:i:0 MD:Z:68G31

SRR062634.1918838 0 GL000226.1 2516 0 100M * 0 0 CAGAAACTTCTTTGTGATGTGTGCATTCAAGTCACAGAGTTGAACATTCCCTTTCGTACAGCAGTTTTTAAACACTCTTTCTGTAGTATCTGGAAGTGAA DDDDDDDDDDDDDDCDDDDDDDDDDDDDDDDCDDDCDBDBDCBBB@CBBBBBBBCBBBBBBBABB@BBBBBB=BB=BBBBABBB?BAABABB??AB?BA? XT:A:R NM:i:1 X0:i:8 X1:i:3XM:i:1 XO:i:0 XG:i:0 MD:Z:68G31

Around 4000 reads differ out of 3,00,000 reads.

ADD REPLY • link 13.2 years ago by Mohamed ▴ 50

0

Entering edit mode

But in the first split, everything matches. If randomness was involved and the first split contains a few 'multimappers', that is, reads with MAPQ=0, there should be some differences. Can you elaborate on that a bit?

ADD REPLY • link 13.2 years ago by Marvin ▴ 900

score 4 · Answer 2 · 2012-05-04

4

Entering edit mode

13.2 years ago

Sukhi Singh 11k

Hi, Why dont you try pBWA which runs on MPI (message parsing interface) by self. You dont need to split fastq files for that.

So, for a reference my qsub script for running the pBWA on 24 nodes looks like this :

     # pBWA script for aligning the raw fastq short read file to the reference genome of mouse (mm9)
    # Author : Sukhdeep Singh
    # Usage : qsub ~/src/useFul/pBWA/pBWA_raw2sam.sh $1 $2 - not being used YET
    # $1 = fastqfile, $2= number of cpu
    ######################################################################

    #!/bin/bash
    # qsub parameters beginning with #$
    # Use bash as the interpreter
    #$ -S /bin/bash
    #before executing the command thange to the working directory
    #$ -cwd
    #submit the complete environment to the execution host
    #$ -V
    # merge stdout and stderr
    #$ -j y
    file=$1
    # set your output file
    #$ -o out.txt

    # now initialize the parallel environment with 24 cpus
    #$ -pe orte 24

    #now we can run our command here.
    /opt/openmpi/bin/mpirun -np 24 --mca btl_tcp_if_include eth0 pBWA aln -f $file.sai /biodata/biodb/ABG/genomes/mouse/mm9/mm9 $1
    /opt/openmpi/bin/mpirun -np 24 --mca btl_tcp_if_include eth0 pBWA samse -f $file.sam /biodata/biodb/ABG/genomes/mouse/mm9/mm9 $file.sai $1

Then, I get 24 sai indexes and 24 sam files for each auto splitted fastq files, which I concatenate and sort later on.

Cheers

ADD COMMENT • link 12.7 years ago by Sukhi Singh 11k

0

Entering edit mode

Thanks Sukhdeep.. I have come across pBWA before, but I have some restrictions in running MPI code in the environment I am looking at..

ADD REPLY • link 13.2 years ago by Mohamed ▴ 50

0

Entering edit mode

I'm trying this on my cluster which has 2 compute nodes with 8 cpus each. The ideal setup will be to run bwa aln process over 16 cpus over the 2 compute nodes.

I've SGE and MPI, etc..

While running the script and submit with qsub it just splits the process in multiple parts but it's just using one of my compute nodes.

So for me -pe orte 16 and -np 16.

It's spawning 16 process over 1 node, i need the 16 processes to split up (8 procs on node 0 and 8 procs on node 1).

Thanks !

ADD REPLY • link 12.7 years ago by guillermo.marco • 0

0

Entering edit mode

Are you running pBWA or BWA. as BWA in not a MPI program, so it wont split up over other nodes.

ADD REPLY • link 12.7 years ago by Sukhi Singh 11k

0

Entering edit mode

pBWA:

  #!/bin/bash
### shell
#$ -S /bin/bash
### env path
#$ -V
### name
#$ -N aln_left
### current work directory
#$ -cwd
### merge outputs
#$ -j y
### PE
#$ -pe mpi 16
### select all.q
#$ -q all.q


mpirun pBWA aln -f aln_left /data_in/references/genomes/human/hg19/bwa_ref/hg19.fa /data_in/rawdata/HapMap_1.fastq > /data_out_2/tmp/mpi/HapMap_1.cloud.left.sai

What's the meaning of --mca on your mpirun command? How do you combine all the files after the process.

ADD REPLY • link 12.6 years ago by guillermo.marco • 0

0

Entering edit mode

Try declaring the number of processors again in the mpirun command and check, -np 16

ADD REPLY • link 12.6 years ago by Sukhi Singh 11k

0

Entering edit mode

I suppose this is the sai files you were talking about:

-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00000.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00001.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00002.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00003.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00004.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00005.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00006.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00007.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00008.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00009.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00010.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00011.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00012.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00013.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00014.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00015.sai

Ok I already tried with -np 16 yesterday. It took like 4 hours to align 1 lane. Using all my 2 nodes (16cpus). While with normal BWA i could split left and right lane align with 8 threads per node and get the alignment part done in 4 ~ 5 hours for both.

Today I test pBWA with 8 threads option -t 8 and no mpi, orte or whatever and it took only 45 min to get align of one lane. So what's the point of using 16 cpu's and split results if it's much slower.

ADD REPLY • link 12.6 years ago by guillermo.marco • 0

0

Entering edit mode

I think the problem is somewhere in splitting the jobs over two nodes, as you said it takes 45 mins to using -t 8, so it efficiently uses single node but 4 hours using both nodes, doesn't makes sense as I think its not running in parallel, just one after the other. Do you have some sys admin, there to help you out or just try the pBWA people!!

ADD REPLY • link 12.6 years ago by Sukhi Singh 11k

0

Entering edit mode

I'm a sys admin myself.

It should be a RAM issue. If you've enough RAM per process i think splitting the whole task into lot of small chunks will be the best. I've powerfull cpus but i lack lot of RAM.

If anyone with a big cluster 100GB + RAM could test this. It would be awesome.

ADD REPLY • link 12.6 years ago by guillermo.marco • 0