Question: BWA missing reads in output
0
gravatar for brendanofallon
3.8 years ago by
brendanofallon0 wrote:

I'm having trouble getting BWA mem (v0.7.12) to consistently output all reads in the input fastqs. I'm working on a simulation study, and I'm generating smallish sets of fake reads, then aligning them with bwa. The paired input fastqs have a few thousand reads each. The missing reads are all from the ends of the input fastq files as far as I can tell, and subsetting the input fastqs to focus on reads missing from a previous attempt can make the reads show back up again. It looks a little like under some circumstances BWA is forgetting to flush a buffer, or something like that.

Since I'm simulating reads I know exactly where they should go, and I give them each unique ids so I can verify which ones exist in the fastqs and the output sams. The behavior seems identical on both linux and mac.  

 Are there circumstances under which BWA will refuse to align / output a read, or set of reads? Everything in the fastqs should end up in the .sam, right? Anyone else notice anything similar?

bwa alignment • 1.4k views
ADD COMMENTlink modified 3.7 years ago by Biostar ♦♦ 20 • written 3.8 years ago by brendanofallon0

output wc -l of fastq and samtools flagstat of bam file.

ADD REPLYlink written 3.8 years ago by apelin20470

Here's a typical run for me:

bofallon@limital:/tmp-working/$ grep '^@' input_r1.fq | wc -l
1250

bofallon@limital:/tmp-working/$ grep '^@' input_r2.fq | wc -l
1250

bofallon@limital:/tmp-working/$ samtools flagstat input_r1.fq.bam
  1222 + 0 in total (QC-passed reads + QC-failed reads)
  0 + 0 secondary
  0 + 0 supplementary
  0 + 0 duplicates
  1222 + 0 mapped (100.00%:-nan%)
  1222 + 0 paired in sequencing
  611 + 0 read1
  611 + 0 read2
  1222 + 0 properly paired (100.00%:-nan%)
  1222 + 0 with itself and mate mapped
  0 + 0 singletons (0.00%:-nan%)
  0 + 0 with mate mapped to a different chr
  0 + 0 with mate mapped to a different chr (mapQ>=5)

 

1250 input read pairs... and 1222 output read pairs.

(I should mention that with this simulated data no base qualities are '@', so the grepping above accurately counts reads - this won't be true for real data)

 

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by brendanofallon0

What command line(s) did you use to go from fastq to bam?

ADD REPLYlink written 3.7 years ago by Sean Davis25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1769 users visited in the last hour