Question: how to compute the insert size in SAM file
0
gravatar for zhangdengwei
15 months ago by
zhangdengwei70
zhangdengwei70 wrote:

Dear all,

I am confused about how the insert length (9th column) was computed in SAM file, for instance, the two PE50 reads below were aligned to reference genome, and the 4th column is 1-based leftmost mapping POSition for each read. For CL200085426L1C001R001_4504, the insert size should be equal to the following equation,

87845148 - 87845071  + 50 = 127

the result is the same as the 9th column, while it is incorrect for CL200085426L1C001R001_4486 with insert length of 11 using the same computing method. So anyone knows how to fetch the insert size? Thanks very much!

CL200085426L1C001R001_4486      83      chr16   34103407        55      50M     =       34103466        11      
CL200085426L1C001R001_4486      163     chr16   34103466        59      50M     =       34103407        -11    

CL200085426L1C001R001_4504      99      chr5    87845071        60      50M     =       87845148        127     
CL200085426L1C001R001_4504      147     chr5    87845148        60      50M     =       87845071        -127
alignment • 666 views
ADD COMMENTlink modified 14 months ago by Biostar ♦♦ 20 • written 15 months ago by zhangdengwei70
1

The 4486 pair are an "outie" mapping — the read at the lower coordinate is the reversed one. So you would have to tell us more about your sequencing protocol for us to venture an opinion on why this has been marked as a proper pair and had an insert size calculated at all.

Indeed: Which tool produced these alignments?

ADD REPLYlink modified 15 months ago • written 15 months ago by John Marshall2.0k

My sequencing protocol is circle-Seq, which can be found in this paper.

ADD REPLYlink written 15 months ago by zhangdengwei70

I have processed CIRCLE-seq data. Are you not using the proprietary software? - https://github.com/tsailabSJ/circleseq

You do not have to specify the insert size.

ADD REPLYlink written 14 months ago by Kevin Blighe63k

Yes, thanks. I have used that software to deal with my data, due to a slight difference between circle-seq data and mine, so I had altered the analysis method in order to fit my own data.

ADD REPLYlink written 14 months ago by zhangdengwei70

here it's just MPOS -POS. Which tool produced this sam ?

ADD REPLYlink written 15 months ago by Pierre Lindenbaum129k

I used BWA mem to align paired-end reads to reference genome.

ADD REPLYlink written 15 months ago by zhangdengwei70

which version of bwa mem, it seems that the code isn't just MPOS-POS https://github.com/lh3/bwa/blob/f090f93460a0379d78a97bfa6133bae5bc8283a6/bwamem.c#L862

ADD REPLYlink written 15 months ago by Pierre Lindenbaum129k

my version of BWA is 0.7.17-r1188

ADD REPLYlink written 15 months ago by zhangdengwei70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1499 users visited in the last hour