Question

BWA mem paired end vs single end shows unusual flagstat summary

3

Entering edit mode

8.4 years ago

rrdavis ▴ 60

I used BWA mem to align some Miseq (2X300bp) reads to a human reference expecting very low mapping since human reads would be considered contaminants.

If I map read1 only I get the following output from samtools flagstat:

238834 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
147 + 0 mapped (0.06%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

which is what I would expect.

However if I map the reads in paired end mode, I receive the following flagstat out:

477668 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
307795 + 0 mapped (64.44%:-nan%)
477668 + 0 paired in sequencing
238834 + 0 read1
238834 + 0 read2
307704 + 0 properly paired (64.42%:-nan%)
307742 + 0 with itself and mate mapped
53 + 0 singletons (0.01%:-nan%)
25 + 0 with mate mapped to a different chr
11 + 0 with mate mapped to a different chr (mapQ>=5)

This output does not make sense to me and I am not sure why the % mapped is that high when read1 % mapped is so low.

Here is the output of read2 alignment only:

238834 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
153 + 0 mapped (0.06%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Why is the paired end analysis showing such high % mapped when each read separately has very low % mapped? am I not interpreting these outputs correctly? Thanks for your help!

alignment next-gen sequencing • 3.6k views

ADD COMMENT • link 8.4 years ago by rrdavis ▴ 60

0

Entering edit mode

Are the read pairs that align highly repetitive? What sort of MAPQs are you getting?

ADD REPLY • link 8.4 years ago by Devon Ryan 105k

0

Entering edit mode

Out of the 307795 reads that map in the paired end analysis, 189907 reads have a mapQ >=5 (~61%).

ADD REPLY • link 8.4 years ago by rrdavis ▴ 60

0

Entering edit mode

Hmm, that's pretty repetitive, but not enough to fully explain these results. My theory was that your sequences were repetitive enough that they had so many possible alignments that the aligner gave up, but that when they were aligned as pairs that the space became restricted enough that things then worked. I suspect that's not what happened. Anyway, perhaps looking at the PE alignments in IGV would be helpful.

ADD REPLY • link 8.4 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks Devon for your input.

I checked out the PE alignments in IGV, as you suggested, and noticed that there are small little clusters (~20 bp in length) of alignments with depth of 20000 reads for some of these clusters. Both read1 and read2 are overlapping in those clusters. IGV shows that the reads are clipped pretty heavily since these clusters are about 20 bp and the reads can be 300 long.

This sequencing data is from from an amplicon assay meant to amplify a viral genome. If I take a look at the small area where some of these reads align and blast those against our viral genome I get 100% alignment.

I am still confused as to why in SE mode I get virtually no alignment but in PE I get these clusters of alignments.

ADD REPLY • link 8.4 years ago by rrdavis ▴ 60

score 0 · Answer 1 · 2017-01-26

Here is the output from BWA (version 0.7.15-r1140) in case it reveals something:

[M::bwa_idx_load_from_disk] read 0 ALT contigs 
[M::process] read 320216 sequences (80000259 bp)... 
[M::process] read 157448 sequences (39909865 bp)... 
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 80, 0, 1) 
[M::mem_pestat] skip orientation FF as there are not enough pairs 
[M::mem_pestat] analyzing insert size distribution for orientation FR... 
[M::mem_pestat] (25, 50, 75) percentile: (47, 307, 448) 
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1250) 
[M::mem_pestat] mean and std.dev: (284.54, 203.62) 
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1651) 
[M::mem_pestat] skip orientation RF as there are not enough pairs 
[M::mem_pestat] skip orientation RR as there are not enough pairs 
[M::mem_process_seqs] Processed 320216 reads in
378.248 CPU sec, 58.330 real sec 
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (1, 40, 0, 1) 
[M::mem_pestat] skip orientation FF as there are not enough pairs 
[M::mem_pestat] analyzing insert size distribution for orientation FR... 
[M::mem_pestat] (25, 50, 75) percentile: (139, 284, 397) 
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 913) 
[M::mem_pestat] mean and std.dev: (266.40, 161.44) 
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1171)
[M::mem_pestat] skip orientation RF as there are not enough pairs 
[M::mem_pestat] skip orientation RR as there are not enough pairs 
[M::mem_process_seqs] Processed 157448 reads in 165.848 CPU sec, 22.118 real sec 
[main] Version: 0.7.15-r1140 
[main] CMD: /home/rrdavis/bin/bwa-0.7.15/bwa mem -t 8 /home/rrdavis/Reference/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/version0.6.0/genome.fa /home/rrdavis/NGS_Data/fastqs/Sample-037_R1_val_1.fq.gz /home/rrdavis/NGS_Data/fastqs/Sample-037_R2_val_2.fq.gz 
[main] Real time: 88.692 sec; CPU: 552.172 sec