Question: BWA mem paired end vs single end shows unusual flagstat summary
1
gravatar for rrdavis
2.2 years ago by
rrdavis20
United States
rrdavis20 wrote:

I used BWA mem to align some Miseq (2X300bp) reads to a human reference expecting very low mapping since human reads would be considered contaminants.

If I map read1 only I get the following output from samtools flagstat:

238834 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
147 + 0 mapped (0.06%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

which is what I would expect.

However if I map the reads in paired end mode, I receive the following flagstat out:

477668 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
307795 + 0 mapped (64.44%:-nan%)
477668 + 0 paired in sequencing
238834 + 0 read1
238834 + 0 read2
307704 + 0 properly paired (64.42%:-nan%)
307742 + 0 with itself and mate mapped
53 + 0 singletons (0.01%:-nan%)
25 + 0 with mate mapped to a different chr
11 + 0 with mate mapped to a different chr (mapQ>=5)

This output does not make sense to me and I am not sure why the % mapped is that high when read1 % mapped is so low.

Here is the output of read2 alignment only:

238834 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
153 + 0 mapped (0.06%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Why is the paired end analysis showing such high % mapped when each read separately has very low % mapped? am I not interpreting these outputs correctly? Thanks for your help!

sequencing next-gen alignment • 1.1k views
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by rrdavis20

Are the read pairs that align highly repetitive? What sort of MAPQs are you getting?

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Devon Ryan89k

Out of the 307795 reads that map in the paired end analysis, 189907 reads have a mapQ >=5 (~61%).

ADD REPLYlink written 2.2 years ago by rrdavis20

Hmm, that's pretty repetitive, but not enough to fully explain these results. My theory was that your sequences were repetitive enough that they had so many possible alignments that the aligner gave up, but that when they were aligned as pairs that the space became restricted enough that things then worked. I suspect that's not what happened. Anyway, perhaps looking at the PE alignments in IGV would be helpful.

ADD REPLYlink written 2.2 years ago by Devon Ryan89k

Thanks Devon for your input.

I checked out the PE alignments in IGV, as you suggested, and noticed that there are small little clusters (~20 bp in length) of alignments with depth of 20000 reads for some of these clusters. Both read1 and read2 are overlapping in those clusters. IGV shows that the reads are clipped pretty heavily since these clusters are about 20 bp and the reads can be 300 long.

This sequencing data is from from an amplicon assay meant to amplify a viral genome. If I take a look at the small area where some of these reads align and blast those against our viral genome I get 100% alignment.

I am still confused as to why in SE mode I get virtually no alignment but in PE I get these clusters of alignments.

ADD REPLYlink written 2.2 years ago by rrdavis20
0
gravatar for rrdavis
2.2 years ago by
rrdavis20
United States
rrdavis20 wrote:

Here is the output from BWA (version 0.7.15-r1140) in case it reveals something:

[M::bwa_idx_load_from_disk] read 0 ALT contigs 
[M::process] read 320216 sequences (80000259 bp)... 
[M::process] read 157448 sequences (39909865 bp)... 
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 80, 0, 1) 
[M::mem_pestat] skip orientation FF as there are not enough pairs 
[M::mem_pestat] analyzing insert size distribution for orientation FR... 
[M::mem_pestat] (25, 50, 75) percentile: (47, 307, 448) 
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1250) 
[M::mem_pestat] mean and std.dev: (284.54, 203.62) 
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1651) 
[M::mem_pestat] skip orientation RF as there are not enough pairs 
[M::mem_pestat] skip orientation RR as there are not enough pairs 
[M::mem_process_seqs] Processed 320216 reads in
378.248 CPU sec, 58.330 real sec 
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (1, 40, 0, 1) 
[M::mem_pestat] skip orientation FF as there are not enough pairs 
[M::mem_pestat] analyzing insert size distribution for orientation FR... 
[M::mem_pestat] (25, 50, 75) percentile: (139, 284, 397) 
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 913) 
[M::mem_pestat] mean and std.dev: (266.40, 161.44) 
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1171)
[M::mem_pestat] skip orientation RF as there are not enough pairs 
[M::mem_pestat] skip orientation RR as there are not enough pairs 
[M::mem_process_seqs] Processed 157448 reads in 165.848 CPU sec, 22.118 real sec 
[main] Version: 0.7.15-r1140 
[main] CMD: /home/rrdavis/bin/bwa-0.7.15/bwa mem -t 8 /home/rrdavis/Reference/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/version0.6.0/genome.fa /home/rrdavis/NGS_Data/fastqs/Sample-037_R1_val_1.fq.gz /home/rrdavis/NGS_Data/fastqs/Sample-037_R2_val_2.fq.gz 
[main] Real time: 88.692 sec; CPU: 552.172 sec
ADD COMMENTlink written 2.2 years ago by rrdavis20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 825 users visited in the last hour