what's the difference between "inferred" vs "expected" insert sizes? I think "inferred" insert size could be calculated based on the read pair's loci on the reference genome. but how to determine "expected" insert size?
The following is an example copied from IGV concerning the information of a read from amplicon-based target enrichment. My questions are:
1) "Read length" is listed as 151bp: does "Read length" include both the sample genomic DNA fragment length (125bp listed as "Genomic span" below) plus PCR primer length ("Clipping= left 26 bp soft )?
2) Does "Mate" means the paired read for this read?
3) Does the "+" sign in "Reference span = chr1:43,814,926-43,815,050 (+) = 125bp" means this read is a forwarding read along the reference sequence strand?
4) vice versa, does "-" sign in "Mate start = chr1:43,815,030 (-)" means the paired read is a reverse read along the reference sequence and the "Mate start" is actually the end position of the paired read of this read? If this is true, the real sequencing starting position's genomic coordinates should be able to be calculated based on the "Read length" of 151 bp ==> @ chr1:43,815,180. Based on the information of "Reference span" of 126 bp (not including the 25 bp of primer length) for this read, it seems OK to determine the sequencing starting position of this read @ chr1:43,814,901. Based on the starting position genomic coordinates of this read (chr1:43,814,901) and its paired read (chr1:43,815,180), the insert size [if the definition for "Insert Size" is the insert (including a sample genomic DNA fragment plus two 25 bp long PCR primers at each end) between the two universal adaptors ligated at the 2 ends of the insert] should be
280 bp (=43,815,180-43,814,901+1). However, "Insert size" is listed below as "212" bp.
I think I got the definition (insert, insert size, mate, mate start, reference span, etc.) wrong somewhere here. Would greatly appreciate it if somebody can help to clarify. Thanks so much:
{{{
Read name = NS500789:146:H25JHAFXY:1:11311:16523:1330
Sample = Sample
Library = Sample
Read group = Sample
Read length = 151bp
----------------------
Mapping = Primary @ MAPQ 70
Reference span = chr1:43,814,926-43,815,050 (+) = 125bp
Cigar = 26S125M
Clipping = left 26 soft
----------------------
Location = chr1:43,814,973
Base = T @ QV 14
----------------------
Mate is mapped = yes
Mate start = chr1:43815030 (-)
Insert size = 212
Second in pair
Pair orientation = F2R1
----------------------
}}}
1) yes
2)yes
3) yes
4) yes
The starting position is
chr1:43,814,926
as stated by IGV, soft-clipped bases don't enter the calculation and you don't have to correct for them.Hi H. Mon,
Thank you so much for the reply. However, based on your reply, the starting position for the insert size calculation is chr1:43,814,926 as stated by IGV, (26 soft-clipped bases don't enter the calculation), with the start position of the mate (the paired read of this read) @ chr1:43815030 stated by IGV and assuming read length of 125 bp for both reads (the read pairs) without counting the 26 soft-clipped bases (read length is otherwise 151 bp with 26 soft-clipped bases included as stated by IGV) ==> I come up with calculated Insert size = 230 bp [= (43,815, 030 - 43,814,926 +1) + 125], which is still not the same as the "Insert size" = 212 bp as stated by IGV shown above??? Could you clarify further on your comment as I am still not getting it. Thanks!
Could you post the result of:
Where
FILE.bam
is the name of the bam you are viewing with IGV?