Question: Clarification For Running Bwa In Parallel‏
4
gravatar for Mohamed
7.0 years ago by
Mohamed40
Mohamed40 wrote:

Hi,

I tried running BWA in parallel by splitting the input FASTQ file into 'n' splits and ran single ended alignment (aln and samse step) in parallel using 'n' bwa processes. I finally merged the SAM file output from all the processes.

When I compare the merged SAM file output with the SAM file generated by running BWA on the entire FASTQ, I find that considerable number of reads differ in their alignment ( in all the splits except the first one). I was assuming that the alignment is independent per read and splitting the FASTQ file and running BWA in parallel should work.

Could someone advise me if its correct to run BWA in the above fashion ?

Thanks, Nabeel

bwa • 6.4k views
ADD COMMENTlink written 7.0 years ago by Mohamed40
5
gravatar for lh3
7.0 years ago by
lh331k
United States
lh331k wrote:

BWA randomly places repetitive reads. There is no way to enforce bwa to give the same output across files. You do not need to worry about that at all.

ADD COMMENTlink written 7.0 years ago by lh331k
1

That behavior is 100% expected. Did you try blasting that first sequence? Did you notice how it aligns equally well to many sites? The "XT:A:R" tells you that the sequence matches many sites qually well. bwa just picks one to assign it to randomly. So run it twice, different reads will be randomly assigned to different places. I bet all of the reads that exhibit this behavior have the XT:A:R tag.

ADD REPLYlink written 7.0 years ago by swbarnes25.3k

Thank you for the reply.. Yes, you are right.. All the reads have XT:A:R tag.

If I run the alignment multiple times I get the same output, i.e. the reads are assigned to the same position again and again. It is not giving random positions each time.

This alignment difference occurs only between the merged (from the split fastq files) scenario and the complete output ( from the entire fastq file). Even if bwa randomly assigns positions to the reads, it does that consistently which I verified by running both the scenarios multiple times.

ADD REPLYlink written 7.0 years ago by Mohamed40
1

You know, for generating random numbers, you need a seed. BWA always uses the same seed and thus the sequence of random numbers is always the same, but not when you run on two separate files.

ADD REPLYlink written 7.0 years ago by lh331k

Now I understand.. Thanks for the explanation. This is my final question..

Can I safely neglect the difference in alignment output as it is not a correctness issue ? As pointed out earlier by swbarnes2, it is just that the reads align equally well to different locations and one of them is chosen in random.

So, is it valid and acceptable to divide the input fastq into multiple splits and run different BWA processes in parallel and finally merge the output ?

Thanks again for all your help..

ADD REPLYlink written 7.0 years ago by Mohamed40

Yes, that is acceptable. However, since there is no way to tell where the reads that map to multiple locations actually came from, there will be ambiguity in placing those reads at the position BWA has mapped to in that particular run. This is also why most of the analysts prefer to work on unique reads.

ADD REPLYlink written 7.0 years ago by Arun2.3k

Thanks Arun for the explanation..

ADD REPLYlink written 7.0 years ago by Mohamed40

I am not sure if I understand what you refer as "repetitive reads". Am attempting a single ended alignment and I get different alignment output as below:

Sample Read1:

SRR062634.1918792 0 1 184972930 0 100M * 0 0 CCATTACATAATGGTAAAGGGATCAATTCAACAAGAAGAGCTAACTATCCTAAATATATATGCACCCAATACAGGAGCACCCAGATTCATAAAGCAAGTC GGEGGGGGGFGEGGGGGGFA?EEEEGGDDDGGFGFGGGGBFGGGGGGGEGGGFDEGGGGEGF=EG?EGGEGBGEEEEEEEEEEBE@AC@?C=CBACCA;@ XT:A:R NM:i:0 X0:i:1056 XM:i:0 XO:i:0 XG:i:0 MD:Z:100

SRR062634.1918792 0 8 47856174 0 100M * 0 0 CCATTACATAATGGTAAAGGGATCAATTCAACAAGAAGAGCTAACTATCCTAAATATATATGCACCCAATACAGGAGCACCCAGATTCATAAAGCAAGTC GGEGGGGGGFGEGGGGGGFA?EEEEGGDDDGGFGFGGGGBFGGGGGGGEGGGFDEGGGGEGF=EG?EGGEGBGEEEEEEEEEEBE@AC@?C=CBACCA;@ XT:A:R NM:i:0 X0:i:1056 XM:i:0 XO:i:0 XG:i:0 MD:Z:100

Sample Read 2

SRR062634.1918838 0 GL000226.1 14792 0 100M * 0 0 CAGAAACTTCTTTGTGATGTGTGCATTCAAGTCACAGAGTTGAACATTCCCTTTCGTACAGCAGTTTTTAAACACTCTTTCTGTAGTATCTGGAAGTGAA DDDDDDDDDDDDDDCDDDDDDDDDDDDDDDDCDDDCDBDBDCBBB@CBBBBBBBCBBBBBBBABB@BBBBBB=BB=BBBBABBB?BAABABB??AB?BA? XT:A:R NM:i:1 X0:i:8 X1:i:3XM:i:1 XO:i:0 XG:i:0 MD:Z:68G31

SRR062634.1918838 0 GL000226.1 2516 0 100M * 0 0 CAGAAACTTCTTTGTGATGTGTGCATTCAAGTCACAGAGTTGAACATTCCCTTTCGTACAGCAGTTTTTAAACACTCTTTCTGTAGTATCTGGAAGTGAA DDDDDDDDDDDDDDCDDDDDDDDDDDDDDDDCDDDCDBDBDCBBB@CBBBBBBBCBBBBBBBABB@BBBBBB=BB=BBBBABBB?BAABABB??AB?BA? XT:A:R NM:i:1 X0:i:8 X1:i:3XM:i:1 XO:i:0 XG:i:0 MD:Z:68G31

Around 4000 reads differ out of 3,00,000 reads.

ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by Mohamed40

But in the first split, everything matches. If randomness was involved and the first split contains a few 'multimappers', that is, reads with MAPQ=0, there should be some differences. Can you elaborate on that a bit?

ADD REPLYlink written 7.0 years ago by Marvin840
4
gravatar for Sukhdeep Singh
7.0 years ago by
Sukhdeep Singh9.7k
Netherlands
Sukhdeep Singh9.7k wrote:

Hi, Why dont you try pBWA which runs on MPI (message parsing interface) by self. You dont need to split fastq files for that.

So, for a reference my qsub script for running the pBWA on 24 nodes looks like this :

     # pBWA script for aligning the raw fastq short read file to the reference genome of mouse (mm9)
    # Author : Sukhdeep Singh
    # Usage : qsub ~/src/useFul/pBWA/pBWA_raw2sam.sh $1 $2 - not being used YET
    # $1 = fastqfile, $2= number of cpu
    ######################################################################

    #!/bin/bash
    # qsub parameters beginning with #$
    # Use bash as the interpreter
    #$ -S /bin/bash
    #before executing the command thange to the working directory
    #$ -cwd
    #submit the complete environment to the execution host
    #$ -V
    # merge stdout and stderr
    #$ -j y
    file=$1
    # set your output file
    #$ -o out.txt

    # now initialize the parallel environment with 24 cpus
    #$ -pe orte 24

    #now we can run our command here.
    /opt/openmpi/bin/mpirun -np 24 --mca btl_tcp_if_include eth0 pBWA aln -f $file.sai /biodata/biodb/ABG/genomes/mouse/mm9/mm9 $1
    /opt/openmpi/bin/mpirun -np 24 --mca btl_tcp_if_include eth0 pBWA samse -f $file.sam /biodata/biodb/ABG/genomes/mouse/mm9/mm9 $file.sai $1

Then, I get 24 sai indexes and 24 sam files for each auto splitted fastq files, which I concatenate and sort later on.

Cheers

ADD COMMENTlink modified 6.5 years ago • written 7.0 years ago by Sukhdeep Singh9.7k

Thanks Sukhdeep.. I have come across pBWA before, but I have some restrictions in running MPI code in the environment I am looking at..

ADD REPLYlink written 7.0 years ago by Mohamed40

I'm trying this on my cluster which has 2 compute nodes with 8 cpus each. The ideal setup will be to run bwa aln process over 16 cpus over the 2 compute nodes.

I've SGE and MPI, etc..

While running the script and submit with qsub it just splits the process in multiple parts but it's just using one of my compute nodes.

So for me -pe orte 16 and -np 16.

It's spawning 16 process over 1 node, i need the 16 processes to split up (8 procs on node 0 and 8 procs on node 1).

Thanks !

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by guillermo.marco0

Are you running pBWA or BWA. as BWA in not a MPI program, so it wont split up over other nodes.

ADD REPLYlink written 6.5 years ago by Sukhdeep Singh9.7k

pBWA:

  #!/bin/bash
### shell
#$ -S /bin/bash
### env path
#$ -V
### name
#$ -N aln_left
### current work directory
#$ -cwd
### merge outputs
#$ -j y
### PE
#$ -pe mpi 16
### select all.q
#$ -q all.q


mpirun pBWA aln -f aln_left /data_in/references/genomes/human/hg19/bwa_ref/hg19.fa /data_in/rawdata/HapMap_1.fastq > /data_out_2/tmp/mpi/HapMap_1.cloud.left.sai

What's the meaning of --mca on your mpirun command? How do you combine all the files after the process.

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by guillermo.marco0

Try declaring the number of processors again in the mpirun command and check, -np 16

ADD REPLYlink written 6.4 years ago by Sukhdeep Singh9.7k

I suppose this is the sai files you were talking about:

-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00000.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00001.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00002.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00003.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00004.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00005.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00006.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00007.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00008.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00009.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00010.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00011.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00012.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00013.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00014.sai
-rw-r--r-- 1 gmarco users     0 Nov 13 13:30 aln_left-1-00015.sai

Ok I already tried with -np 16 yesterday. It took like 4 hours to align 1 lane. Using all my 2 nodes (16cpus). While with normal BWA i could split left and right lane align with 8 threads per node and get the alignment part done in 4 ~ 5 hours for both.

Today I test pBWA with 8 threads option -t 8 and no mpi, orte or whatever and it took only 45 min to get align of one lane. So what's the point of using 16 cpu's and split results if it's much slower.

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by guillermo.marco0

I think the problem is somewhere in splitting the jobs over two nodes, as you said it takes 45 mins to using -t 8, so it efficiently uses single node but 4 hours using both nodes, doesn't makes sense as I think its not running in parallel, just one after the other. Do you have some sys admin, there to help you out or just try the pBWA people!!

ADD REPLYlink written 6.4 years ago by Sukhdeep Singh9.7k

I'm a sys admin myself.

It should be a RAM issue. If you've enough RAM per process i think splitting the whole task into lot of small chunks will be the best. I've powerfull cpus but i lack lot of RAM.

If anyone with a big cluster 100GB + RAM could test this. It would be awesome.

ADD REPLYlink written 6.4 years ago by guillermo.marco0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1450 users visited in the last hour