Question: BWA MEM indexing
gravatar for david.4.ray
23 months ago by
United States
david.4.ray0 wrote:

I'm a relative newbie and trying t work my way through using bwa mem.  I've successfully aligned PE reads to a reference genome previously using bwa aln but I thought this might work better.

I'm getting the following output:

[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 2012726 reads in 2859.953 CPU sec, 143.173 real sec

I could be wrong but I think this may be a problem with my index.  I'm attempting to use the same indexed reference genome as my bwa aln work.  The bwa manual suggests that there is a different algorithm for building the index for use with bwa mem.  Is that true?  If so, I can't figure out how to implement it.  

Any help would be appreciated.


alignment assembly • 1.7k views
ADD COMMENTlink modified 23 months ago by Istvan Albert ♦♦ 73k • written 23 months ago by david.4.ray0
gravatar for Istvan Albert
23 months ago by
Istvan Albert ♦♦ 73k
University Park, USA
Istvan Albert ♦♦ 73k wrote:

The index is unrelated to these messages. What these messages indicate that the orientation and distance between the paired end reads is not what the tool expects. You can investigate the alignments yourself in the SAM file (column TLEN and other paired end related columns) or via a tool like IGV and verify the pairing information.

ADD COMMENTlink written 23 months ago by Istvan Albert ♦♦ 73k

Yeah, I was afraid it might be something less simple.  It's interesting, though, because I previously successfully assembled the genome using bwa aln and exactly the same files that I'm using here.  The only difference is that I've switched to bwa mem.  Does that change your response at all?

ADD REPLYlink written 23 months ago by david.4.ray0

You have not assembled the reads, you have aligned them.  Choosing the alignment algorithm should not affect the read  pairing that the tool reports. That being said the reports that you see are not a standardized scientific output and I would not put too much stock into that, there is little to no documentation to what those messages mean and when/how they are reported.

They may be  a warning sing that perhaps you are not running the tool correctly, for example if read pairs are listed in the wrong order you would get that. 

ADD REPLYlink modified 23 months ago • written 23 months ago by Istvan Albert ♦♦ 73k

No.  I have actually performed a full reference-based assembly of the genome using these reads.  I'll explain.  

We have a working pipeline that will index the reference genome, then process the raw reads from the genome we want to assemble using that reference.  One of the processing steps is to use trimmomatic to generate paired and unpaired read files (R1 and R2).   Those are then passed to bwa aln to map the reads to the reference, generating our first .sam files.  Those are converted to .bam files and later sorted using samtools.  Picard tools is then used to remove duplicates and then a pileup is generate, which gives rise to our final assembly.  

As I understand it, this is a pretty standard procedure.  The assembly seems valid.  We've by no means completely validated it but we can find a selected set of genes in it.

I am attempting to run the same pipeline but by using bwa mem instead. I've made what I think are the necessary adjustments.  It seemed reasonable to skip the trimmomatic step and move straight to bwa mem.  The section of the script is below.  Forgive the aliases.  RP1.....

Holy crap I just found the problem.  I'm mapping R1 and R1 in the same step rather than.  R1 and R2. I hate being me sometimes.

    $BWA_HOME/bwa mem             \
        $REF_HOME/$REFGENOME            \
           $PROCESSED_READS_HOME/$RP1    \
           $PROCESSED_READS_HOME/$RP1    \
        -t $THREADS    \
        > $ABBREV"_aln_pe.sam"     

ADD REPLYlink written 23 months ago by david.4.ray0

Yep.  That was it.  Working just fine now.

ADD REPLYlink written 23 months ago by david.4.ray0

Glad that it worked out but do note that you are not using the terms correctly. What you did with bwa is NOT an assembly. You have assembled the results that were produced by bwa. And for that you used another method, with a consensus caller, that is your assembler not bwa. That is fine of course but it causes a lot of confusion when someone calls the alignment process with bwa as an assembly. We assume that they really do something else.

ADD REPLYlink written 23 months ago by Istvan Albert ♦♦ 73k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1456 users visited in the last hour