Question: value of TLEN not conforming to the SAM specifications
0
10 months ago by
gabrielerosso130 wrote:

Hello to all,

I'm studying the SAM specifications, in particular the TLEN field (template length). I found an example of paired-end reads in the SAM file, where the value of TLEN does not conform to the specifications.

``````read1:
NB501050:47:HHMJVBGXY:1:11203:16016:13370   83  chrM    6622    70  136M    =   6624    -134    CACCCTGAAGTTTATATTCTTATCCTACCAGGCTTCGGAATAATCTCCCATATTGTAACTTACTACTCCGGAAAAAAAGAACCATTTGGATACATAGGTATGGTCTGAGCTATGATATCAATTGGCTTCCTAGGGT    MELNHFFFEGFGBF,FGEJG,FFGJCCIKFGDJGFAE1GECGFFJFIIKFCFGGHCGDJGCDJCDJFIBGHGGGGGGFGGDIKFGGGGHFCDKFCFGHCFGEHFJGHFHJCFGHFCFFKDFFGFHIFFII@FOOON    MD:Z:136

NB501050:47:HHMJVBGXY:1:11203:16016:13370   163 chrM    6624    70  136M    =   6622    134 CCCTGAAGTTTATATTCTTATCCTACCAGGCTTCGGAATAATCTCCCATATTGTAACTTACTACTCCGGAAAAAAAGAACCATTTGGATACATAGGTATGGTCTGAGCTATGATATCAATTGGCTTCCTAGGGTTT    NIONJEFICGFCECEFGEFCFHFFCHFGJIIFGHBIFGFCGFHFHGGGFCFGKDCGHFGCHFCHEGGBIFGGGGGGJFGHGGFGGKIFFCHGFCJID-KIDHFKAIIFCFKFFCFHGGFGKI=FFH;E,IHMIHH MD:Z:136
``````

In this case, in fact, there seem to be 2 ambiguities: 1. [the value of TLEN should be 138, counting the 136 bases mapping plus the 2 bases of difference between the two paired-end reads: note that the 136 bases are all mappings being CIGAR = 136M] 2. [the read1 is the leftmost segment (POS = 6622), so the value of TLEN should be positive, instead of negative as in the example]

For point 2 I came to the conclusion that the read1 is not considered the leftmost segment because, being reverse (FLAG=83), its original mapping position (as in FASTQ) would be 6622 + 135 = 6757 (if it had been left reverse during the alignment). So, in effect, considering the original strand orientation, this read would be the rightmost segment, so with negative TLEN. However, if this is the explanation for the negative value of TLEN for read1, the SAM specifications would be unclear to my mind.

Gabriele

tlen • 256 views
modified 10 months ago by d-cameron2.2k • written 10 months ago by gabrielerosso130

what is the software (and version ) that produced such sam ?

GATK However I'm working directly on the SAM file, so I don't know the choices during alignment.

0
10 months ago by
d-cameron2.2k
Australia
d-cameron2.2k wrote:

There are many, many SAM files where the content does not conform to the SAM specifications.

In the case of TLEN, the TLEN definition was changed in a specification update but many tools (including GATK) were not updated to reflect this. The spec themselves have recently changed to acknowledge that there are two competing definition in widespread usage (including GATK using the 'old' TLEN definition). See https://github.com/samtools/hts-specs/pull/366

Edit: this means the field itself is essentially useless as your assumptions will only be valid for the current version of your pipeline, or you're going to to recalculate it yourself using your definition of choice.