TLEN sam format
1
0
Entering edit mode
5.4 years ago
hpapoli ▴ 140

Hi,

My question is about TLEN field in SAM format. Here is an example of the first 9 column of BAM file. Is TLEN (320) calculated as follows: (351+62) - 93 = 320?

SRR1144953.182159       117     NW_008793873.1  2       0       *       =       2       0
SRR1144953.182159       185     NW_008793873.1  2       60      75M     =       2       0
SRR1144953.8051227      117     NW_008793873.1  14      0       *       =       14      0
SRR1144953.8051227      185     NW_008793873.1  14      60      75M     =       14      0
SRR1144953.1496220      163     NW_008793873.1  93      60      75M     =       351     320
SRR1144953.1496220      83      NW_008793873.1  351     60      62M13S  =       93      -320


Why TLEN is zero for the second and fourth lines?

Thanks!

samtools alignment • 3.4k views
0
Entering edit mode

also, this doc. is very helpful - http://samtools.github.io/hts-specs/SAMv1.pdf

6
Entering edit mode
5.4 years ago
Amitm ★ 2.2k

Hi, You can check the meaning of the flags (Col.2) here - https://broadinstitute.github.io/picard/explain-flags.html

Thats Paired-end data and there are 3 pair of IDs in the Col.1. The first two pairs have their R1 unmapped and hence TLEN col. value is 0. The 3rd pair has flag values 163 and 83. That indicates read mapped in proper-pair.

0
Entering edit mode

This is very clear, thank you! One more question, do a given pair always come together? That is, in the example above, we have 3 pairs and for each, we have one QNAME. However, in the next 3 lines of the BAM, we have the following:

SRR1144953.1448091      99      NW_008793874.1  31      60      75M     =       315     359
SRR1144953.21253738     99      NW_008793874.1  74      60      75M     =       366     367
SRR1144953.10190936     99      NW_008793874.1  86      60      75M     =       361     350


From Flag 99, I see that read is paired and it is mapped in proper pair. However, why don't I see a pair of QNAME is above? Sorry if my questions are too simple but I couldn't figure out these details from the document.

0
Entering edit mode

I think I see the patten. The file is sorted based on the position coordinate and there are other reads coming in between, I find their mates in the next lines as below:

SRR1144953.1448091      99      NW_008793874.1  31      60      75M     =       315     359
SRR1144953.21253738     99      NW_008793874.1  74      60      75M     =       366     367
SRR1144953.10190936     99      NW_008793874.1  86      60      75M     =       361     350
SRR1144953.4004472      69      NW_008793874.1  230     0       *       =       230     0
SRR1144953.4004472      137     NW_008793874.1  230     60      75M     =       230     0
SRR1144953.3440089      69      NW_008793874.1  241     0       *       =       241     0
SRR1144953.3440089      137     NW_008793874.1  241     60      75M     =       241     0
SRR1144953.1448091      147     NW_008793874.1  315     60      75M     =       31      -359
SRR1144953.10190936     147     NW_008793874.1  361     60      75M     =       86      -350
SRR1144953.21253738     147     NW_008793874.1  366     60      75M     =       74      -367

0
Entering edit mode

Hi, Some of the parameters of the aligner being used can alter the way you see the alignments being reported. One is of course as you noted, coordinate-sorted output. Others that come to my mind are - 1) If secondary alignments (or, all valid alignments) are being reported, then you might see one unique read ID more than twice. 2) Also, if only mapped reads have been reported and for a read-pair only one of the mates was mapped, then you would see one entry only for the ID (instead of two)

The flag values should be helpful in inferring what is going on.