insert size discrepencies using bwa backtrack and bwa mem
1
2
Entering edit mode
7.9 years ago
urmi208 ▴ 30

Hi all,

I have a question regarding bwa-mem and bwa aln. I am trying to align 150 bp paired end data against a genome assembly using bwa mem. However, I am seeing a long peak of very short inserts for alignments which is not expected according to the lab reports. Just to verify I did the alignments using bwa bactrack and the peak has disappeared. Logs look very different too in terms of low and high boundries of insert length. Are there any insert size cutoffs set for both these algorithms or is there something in the way they align which causes this discrepancy?

BWA-bactrack:

[bwa_aln_core] 262144 sequences have been processed.
[bwa_aln_core] 262144 sequences have been processed.
[bwa_aln_core] calculate SA coordinate... [bwa_aln_core] calculate SA coordinate... 195.76 sec
[bwa_aln_core] write to the disk... 204.87 sec
[bwa_aln_core] write to the disk... [infer_isize] (25, 50, 75) percentile: (173, 231, 337)
[infer_isize] low and high boundaries: **150 and 665** for estimating avg and std
[infer_isize] inferred external isize from 26006 pairs: 271.186 +/- 116.759
[infer_isize] skewness: 1.187; kurtosis: 0.736; ap_prior: 3.73e-05
[infer_isize] inferred maximum insert size: 1029 (6.49 sigma)
[bwa_sai2sam_pe_core] time elapses: 5.22 sec
[bwa_sai2sam_pe_core] changing coordinates of 18261 alignments.

enter image description here

bwa-mem

[M::main_mem] read 69182 sequences (10000020 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (113, 4047, 36, 90)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (175, 331, 548)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1294)
[M::mem_pestat] mean and std.dev: (369.65, 270.10)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1667)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (144, 217, 329)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: **(1, 699)**
[M::mem_pestat] mean and std.dev: (242.81, 136.88)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 884)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (101, 208, 352)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 854)
[M::mem_pestat] mean and std.dev: (224.74, 141.14)

bwa-mem insert size distribution

Any light you could shed on this will be very helpful Many thanks.

bwa mem insert bwa-backtrack • 3.0k views
ADD COMMENT
0
Entering edit mode

Not answering your question but would you mind checking the insert sizes (you can sub-sample a set if you don't want to use full files) using BBMap (if your reads don't overlap) or BBMerge (if R1/R2 overlap). Both programs are part of BBMap suite. You will want to use ihist= flag to create a file with insert size distribution.

ADD REPLY
0
Entering edit mode

Sure. I will give that a go and let you know what I get.

ADD REPLY
2
Entering edit mode
7.9 years ago
lh3 33k

You are sequencing into adapters. Bwa-aln does end-to-end alignment. It can't map reads with long adapter sequences. The min insert size can't be much smaller than the read length in your case. Therefore you see a distribution apparently truncated at ~150bp. Bwa-mem does local alignment. It soft-clips adapter sequences. The insert size can be much smaller than the read length. In all, bwa-mem is doing the right thing, except the sharp peak at ~10bp, which is probably some other artifacts caused by assembly or by mapping. Either way, that is a tiny fraction of reads.

If you are doing typical whole-genome sequencing, your library is not optimal. It is still usable, but can be improved.

PS: as there are a lot of adapters in your data, you should trim adapter before mapping. You will get better results with bwa-aln. It also helps downstream SNP calling even if you use bwa-mem.

ADD COMMENT
0
Entering edit mode

Thank you for the explanation, Heng. I have already trimmed the adapters but may be there are still some traces left in there. I will have a look.

ADD REPLY

Login before adding your answer.

Traffic: 1474 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6