I have read through many posts about insert size here. And see a very good answer about the insert size.
It is still not so clear for me to understand insert size. I hope some experts can make it clearer.
As illustrated in a good blog and a good anwser, the "insert size"=sequence between adapters (actually encompasses R1 and R2 as well as the unknown gap between them) and it is also known that the ninth column of the SAM file (TLEN) represents the insert size
However, here are some things I still don't understand.
First, in RNA seq data, if the alignments are spliced, and the TLEN reports the distance from the 5'-most to 3'-most position (if my understanding is right). So according to my understaning the TLEN number will include the possible introns which means the TLEN would be unsally longer than "actual insert size"?
Second, if we are mapping DNA sequences, then the fragment length and "insert size"/"template length" are the same?
Third, how Picard tools CollectInsertSizeMetrics actually do to calculate the insert size distribution of a paired-end library, does it only use the TLEN or exclude possible introns?
Any answer to help me better ubderstand this conception will be greatly appreciated.