Question: insert size discrepencies using bwa backtrack and bwa mem
2
gravatar for urmi208
2.8 years ago by
urmi20830
urmi20830 wrote:

Hi all,

I have a question regarding bwa-mem and bwa aln. I am trying to align 150 bp paired end data against a genome assembly using bwa mem. However, I am seeing a long peak of very short inserts for alignments which is not expected according to the lab reports. Just to verify I did the alignments using bwa bactrack and the peak has disappeared. Logs look very different too in terms of low and high boundries of insert length. Are there any insert size cutoffs set for both these algorithms or is there something in the way they align which causes this discrepancy?

BWA-bactrack:

[bwa_aln_core] 262144 sequences have been processed.
[bwa_aln_core] 262144 sequences have been processed.
[bwa_aln_core] calculate SA coordinate... [bwa_aln_core] calculate SA coordinate... 195.76 sec
[bwa_aln_core] write to the disk... 204.87 sec
[bwa_aln_core] write to the disk... [infer_isize] (25, 50, 75) percentile: (173, 231, 337)
[infer_isize] low and high boundaries: **150 and 665** for estimating avg and std
[infer_isize] inferred external isize from 26006 pairs: 271.186 +/- 116.759
[infer_isize] skewness: 1.187; kurtosis: 0.736; ap_prior: 3.73e-05
[infer_isize] inferred maximum insert size: 1029 (6.49 sigma)
[bwa_sai2sam_pe_core] time elapses: 5.22 sec
[bwa_sai2sam_pe_core] changing coordinates of 18261 alignments.

enter image description here

bwa-mem

[M::main_mem] read 69182 sequences (10000020 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (113, 4047, 36, 90)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (175, 331, 548)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1294)
[M::mem_pestat] mean and std.dev: (369.65, 270.10)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1667)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (144, 217, 329)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: **(1, 699)**
[M::mem_pestat] mean and std.dev: (242.81, 136.88)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 884)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (101, 208, 352)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 854)
[M::mem_pestat] mean and std.dev: (224.74, 141.14)

bwa-mem insert size distribution

Any light you could shed on this will be very helpful Many thanks.

insert bwa mem bwa-backtrack • 1.3k views
ADD COMMENTlink modified 2.8 years ago by lh331k • written 2.8 years ago by urmi20830

Not answering your question but would you mind checking the insert sizes (you can sub-sample a set if you don't want to use full files) using BBMap (if your reads don't overlap) or BBMerge (if R1/R2 overlap). Both programs are part of BBMap suite. You will want to use ihist= flag to create a file with insert size distribution.

ADD REPLYlink written 2.8 years ago by genomax64k

Sure. I will give that a go and let you know what I get.

ADD REPLYlink written 2.8 years ago by urmi20830
2
gravatar for lh3
2.8 years ago by
lh331k
United States
lh331k wrote:

You are sequencing into adapters. Bwa-aln does end-to-end alignment. It can't map reads with long adapter sequences. The min insert size can't be much smaller than the read length in your case. Therefore you see a distribution apparently truncated at ~150bp. Bwa-mem does local alignment. It soft-clips adapter sequences. The insert size can be much smaller than the read length. In all, bwa-mem is doing the right thing, except the sharp peak at ~10bp, which is probably some other artifacts caused by assembly or by mapping. Either way, that is a tiny fraction of reads.

If you are doing typical whole-genome sequencing, your library is not optimal. It is still usable, but can be improved.

PS: as there are a lot of adapters in your data, you should trim adapter before mapping. You will get better results with bwa-aln. It also helps downstream SNP calling even if you use bwa-mem.

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by lh331k

Thank you for the explanation, Heng. I have already trimmed the adapters but may be there are still some traces left in there. I will have a look.

ADD REPLYlink written 2.8 years ago by urmi20830
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2372 users visited in the last hour