Question: value of TLEN not conforming to the SAM specifications
0
gravatar for gabrielerosso13
4 weeks ago by
gabrielerosso130 wrote:

Hello to all,

I'm studying the SAM specifications, in particular the TLEN field (template length). I found an example of paired-end reads in the SAM file, where the value of TLEN does not conform to the specifications.

read1:
NB501050:47:HHMJVBGXY:1:11203:16016:13370   83  chrM    6622    70  136M    =   6624    -134    CACCCTGAAGTTTATATTCTTATCCTACCAGGCTTCGGAATAATCTCCCATATTGTAACTTACTACTCCGGAAAAAAAGAACCATTTGGATACATAGGTATGGTCTGAGCTATGATATCAATTGGCTTCCTAGGGT    MELNHFFFEGFGBF,FGEJG,FFGJCCIKFGDJGFAE1GECGFFJFIIKFCFGGHCGDJGCDJCDJFIBGHGGGGGGFGGDIKFGGGGHFCDKFCFGHCFGEHFJGHFHJCFGHFCFFKDFFGFHIFFII@FOOON    MD:Z:136

read2:
NB501050:47:HHMJVBGXY:1:11203:16016:13370   163 chrM    6624    70  136M    =   6622    134 CCCTGAAGTTTATATTCTTATCCTACCAGGCTTCGGAATAATCTCCCATATTGTAACTTACTACTCCGGAAAAAAAGAACCATTTGGATACATAGGTATGGTCTGAGCTATGATATCAATTGGCTTCCTAGGGTTT    NIONJEFICGFCECEFGEFCFHFFCHFGJIIFGHBIFGFCGFHFHGGGFCFGKDCGHFGCHFCHEGGBIFGGGGGGJFGHGGFGGKIFFCHGFCJID-KIDHFKAIIFCFKFFCFHGGFGKI=FFH;E,IHMIHH MD:Z:136

In this case, in fact, there seem to be 2 ambiguities: 1. [the value of TLEN should be 138, counting the 136 bases mapping plus the 2 bases of difference between the two paired-end reads: note that the 136 bases are all mappings being CIGAR = 136M] 2. [the read1 is the leftmost segment (POS = 6622), so the value of TLEN should be positive, instead of negative as in the example]

For point 2 I came to the conclusion that the read1 is not considered the leftmost segment because, being reverse (FLAG=83), its original mapping position (as in FASTQ) would be 6622 + 135 = 6757 (if it had been left reverse during the alignment). So, in effect, considering the original strand orientation, this read would be the rightmost segment, so with negative TLEN. However, if this is the explanation for the negative value of TLEN for read1, the SAM specifications would be unclear to my mind.

Gabriele

tlen • 113 views
ADD COMMENTlink modified 4 weeks ago by d-cameron2.1k • written 4 weeks ago by gabrielerosso130

what is the software (and version ) that produced such sam ?

ADD REPLYlink written 4 weeks ago by Pierre Lindenbaum123k

GATK However I'm working directly on the SAM file, so I don't know the choices during alignment.

ADD REPLYlink written 4 weeks ago by gabrielerosso130
0
gravatar for d-cameron
4 weeks ago by
d-cameron2.1k
Australia
d-cameron2.1k wrote:

There are many, many SAM files where the content does not conform to the SAM specifications.

In the case of TLEN, the TLEN definition was changed in a specification update but many tools (including GATK) were not updated to reflect this. The spec themselves have recently changed to acknowledge that there are two competing definition in widespread usage (including GATK using the 'old' TLEN definition). See https://github.com/samtools/hts-specs/pull/366

Edit: this means the field itself is essentially useless as your assumptions will only be valid for the current version of your pipeline, or you're going to to recalculate it yourself using your definition of choice.

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by d-cameron2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2189 users visited in the last hour