IGV terminology questions: "inferred" insert size vs. "expected" insert size and other
0
0
Entering edit mode
5.8 years ago

what's the difference between "inferred" vs "expected" insert sizes? I think "inferred" insert size could be calculated based on the read pair's loci on the reference genome. but how to determine "expected" insert size?

The following is an example copied from IGV concerning the information of a read from amplicon-based target enrichment. My questions are:

1) "Read length" is listed as 151bp: does "Read length" include both the sample genomic DNA fragment length (125bp listed as "Genomic span" below) plus PCR primer length ("Clipping= left 26 bp soft )?

2) Does "Mate" means the paired read for this read?

3) Does the "+" sign in "Reference span = chr1:43,814,926-43,815,050 (+) = 125bp" means this read is a forwarding read along the reference sequence strand?

4) vice versa, does "-" sign in "Mate start = chr1:43,815,030 (-)" means the paired read is a reverse read along the reference sequence and the "Mate start" is actually the end position of the paired read of this read? If this is true, the real sequencing starting position's genomic coordinates should be able to be calculated based on the "Read length" of 151 bp ==> @ chr1:43,815,180. Based on the information of "Reference span" of 126 bp (not including the 25 bp of primer length) for this read, it seems OK to determine the sequencing starting position of this read @ chr1:43,814,901. Based on the starting position genomic coordinates of this read (chr1:43,814,901) and its paired read (chr1:43,815,180), the insert size [if the definition for "Insert Size" is the insert (including a sample genomic DNA fragment plus two 25 bp long PCR primers at each end) between the two universal adaptors ligated at the 2 ends of the insert] should be
280 bp (=43,815,180-43,814,901+1). However, "Insert size" is listed below as "212" bp.

I think I got the definition (insert, insert size, mate, mate start, reference span, etc.) wrong somewhere here. Would greatly appreciate it if somebody can help to clarify. Thanks so much:

{{{

Read name = NS500789:146:H25JHAFXY:1:11311:16523:1330

Sample = Sample

Library = Sample

Read group = Sample

Read length = 151bp
----------------------
Mapping = Primary @ MAPQ 70

Reference span = chr1:43,814,926-43,815,050 (+) = 125bp

Cigar = 26S125M

Clipping = left 26 soft
----------------------
Location = chr1:43,814,973

Base = T @ QV 14
----------------------
Mate is mapped = yes

Mate start = chr1:43815030 (-)

Insert size = 212

Second in pair

Pair orientation = F2R1
----------------------

}}}
next-gen • 2.7k views
ADD COMMENT
0
Entering edit mode

1) yes

2)yes

3) yes

4) yes

Based on the starting position genomic coordinates of this read (chr1:43,814,901)

The starting position is chr1:43,814,926 as stated by IGV, soft-clipped bases don't enter the calculation and you don't have to correct for them.

ADD REPLY
0
Entering edit mode

Hi H. Mon,

Thank you so much for the reply. However, based on your reply, the starting position for the insert size calculation is chr1:43,814,926 as stated by IGV, (26 soft-clipped bases don't enter the calculation), with the start position of the mate (the paired read of this read) @ chr1:43815030 stated by IGV and assuming read length of 125 bp for both reads (the read pairs) without counting the 26 soft-clipped bases (read length is otherwise 151 bp with 26 soft-clipped bases included as stated by IGV) ==> I come up with calculated Insert size = 230 bp [= (43,815, 030 - 43,814,926 +1) + 125], which is still not the same as the "Insert size" = 212 bp as stated by IGV shown above??? Could you clarify further on your comment as I am still not getting it. Thanks!

ADD REPLY
0
Entering edit mode

Could you post the result of:

samtools view FILE.bam | grep "NS500789:146:H25JHAFXY:1:11311:16523:1330"

Where FILE.bam is the name of the bam you are viewing with IGV?

ADD REPLY

Login before adding your answer.

Traffic: 1790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6