Question: Differences In Alignments When Run Twice (Using The Same Reads/Reference As Input)
gravatar for Stroehli
7.1 years ago by
Berlin, Germany
Stroehli30 wrote:

Hi, I wonder if it is possible to get different resulting alignments using the same reads and the same reference as input?

I did the alignment step twice and my BAM-files from the two runs differ slightly. I used BWA for the alignment step.

Is there any possibility that BWA's decisions where to align a read can differ within two runs having exactly the same parameters and input?

Are there any "random" steps in the algorithm? Does multi-threading affect the alignment?

Furthermore I found, that the regions differing in between the two BAMs showed strikingly low read-quality. But I don't know if this has something to do with the observed problem.

Any help or further insight is appreciated.

Cheers, Stroehli

alignment bwa bam • 2.0k views
ADD COMMENTlink modified 7.1 years ago by Pierre Lindenbaum126k • written 7.1 years ago by Stroehli30
gravatar for Pierre Lindenbaum
7.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum126k wrote:

Yes, some C random functions are used by bwa:

$ grep rand bwa-0.6.2/*.c | grep -v strand

bntseq.c:            if (c >= 4) c = lrand48()&3;
bntseq.c:    bns->seed = 11; // fixed seed for random generator
bntseq.c:    srand48(bns->seed);
bwa.c:    // count number of hits; randomly select one alignment
bwa.c:        if (drand48() * (p->l - p->k + 1 + cnt) > (double)cnt) {
bwa.c:            one->sa = p->k + (bwtint_t)((p->l - p->k + 1) * drand48());
bwape.c:    srand48(bns->seed);
bwase.c:            if (drand48() * (p->l - p->k + 1 + cnt) > (double)cnt) {
bwase.c:                s->sa = p->k + (bwtint_t)((p->l - p->k + 1) * drand48());
bwase.c:         * number of random hits. */
bwase.c:                    double p = 1.0, x = drand48();
bwase.c:    srand48(bns->seed);
bwtsw2_aux.c:            if (p->flag&1) q->qual = 0; // this is a random hit
bwtsw2_aux.c:            if (c >= 4) { c = (int)(drand48() * 4); ++k; } // FIXME: ambiguous bases are not properly handled
bwtsw2_aux.c:            if (c >= 4) c = (int)(drand48() * 4);
bwtsw2_core.c:    { // choose a random one
bwtsw2_core.c:        j = (int)(i * drand48());
bwtsw2_main.c:    srand48(11);
ADD COMMENTlink written 7.1 years ago by Pierre Lindenbaum126k
gravatar for Fred
7.1 years ago by
Paris, France
Fred730 wrote:

The BWA documentation states that :

"sampe[...] Generate alignments in the SAM format given paired-end reads. Repetitive read pairs will be placed randomly". It could explain the differences, especially in the low quality reads.

ADD COMMENTlink written 7.1 years ago by Fred730

Thanks for your reply. Could you elaborate on that? Do I get that right, that repetitive read pairs are reads that map equally well in more than one region on the reference? How would this be consistent with the fact that I have low quality reads? So a low quality read is more likely to map at more than one position equally well (or rather equally badly in this case) and therefore will be placed randomly more often? Is that what you are trying to say?

ADD REPLYlink written 7.1 years ago by Stroehli30

Repetitive hits are reads that map at multiple positions on the reference.

Concerning the quality, I indeed meant that low quality reads may tend to map at more than one position, but it has to be verified because the bwa doc states that:

"Base quality is NOT considered in evaluating hits"

ADD REPLYlink written 7.1 years ago by Fred730
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 777 users visited in the last hour