I am watching https://www.coursera.org/learn/genomic-tools/home/week/3, and I came across the following example sam file:
141217_CIDR4_0073_BHCFG7ADXX:2:1111:3128:29074 99 chr 10021 0 50M = 10151 180 ...
I have a question on the 9th column, TLEN. The start position of the read above is 10021 and the start position of the mate is 10151. Than the lenghth between the two is 10151-10121+1=131.
QUESTION1: Am I correct? Is this position 0-based?
However, TLEN, which seems to be the insert size, is 180. Why is it like this?
Also, in samtools spec, I've found 7.RNEXT: Reference sequence name of the primary alignment of the NEXT read in the template.
QUESTION2: What does template mean in this case? Does the template mean the set of two reads that are paired (i.e. a paired end read). Can there be more than 2 read in the template? If so, why?
QUESTION3: Does the next read in the template mean the mate of the read?
And also, I've found 9.TLEN: signed observed Template LENgth
. If all segments are mapped to the same reference, the unsigned observed template length equals the number of bases from the leftmost mapped base to the rightmost mapped base. The leftmost segment has a plus sign and the rightmost has a minus sign. The sign of segments in the middle is undefined. It is set as 0 for single-segment template or when the information is unavailable.
QUESTION4: What is the difference between SIGNED and UNSIGNED observed template length? Could you give me the two length for the above example?
QUESTION5: What doees segments in the middle mean? Is the sign of segments related to SIGNED template length?
QUESTION6: It says that the leftmost segment has a plus sign and the rightmost has a minus sign. However, in the example above, I have an optional field XS:A:-, which means the given strand is -. Isn't it the leftmost segment though?
There are 6 questions in total. It maybe a basic question since I am new to this field. Thank you very much.
Also, for the flag field, why does each bit represented as follows? (0X800)(0X400)(0X200)(0X100) (0X80)(0X40)(0X20)(0X10) (0X8)(0X4)(0X2)(0X1) It seems to be related to hex, but I don't completely understand. Thank you.
I've added some structure and highlighting to your question. Please do that in the future as well to improve the readability and overall impression of your question. You'll see that this increases your chance of good responses.