sam file help for beginner
2
0
Entering edit mode
5.8 years ago
vellryba • 0

Hello, I am new to bioinformatics and have some question about sam files. Here, the 392 refers to the left most position in reference genome where the read aligns. The 194 is the position where the second mate aligns. The TLEN is 432. How do you get to that number? I was trying to read up about this but I dont get it.

Thank you!

M03972:51:000000000-BJVL8:1:1101:8198:11811 83  gi|11111113|ref|TL| 392 42  234M    =   194 -432    GCCCAGTGGTAGTGGGCACGACCGACAGGCTTGGAGCGCCCACTTACACGTGGGGGGAGAATGAGACAGATGTCTTCCTATTGAACAGCACTCGACCACCGCTGGGGTCGTGGTTCGGCTGCACGTGGATGAACTCTTCTGGCTACACCAAGACTTGCGGCGCACCACCCTGCCGTACTAGAGCTGACTTCAACGCCAGCACGGACCTGTTGTGCCCCACGGACTGTTTTAGGA  E.FFFFEFB/FEBFFFD=CDBFFFBFFFFFFFFFFFFFFFFBFFFFEAFFAGGGGGGGGGGGGGGGGGGGGGGGEGGHHGHHHHHHHGHGGGGHGGGGGGHGGGGGGGGGHGGGGGHHHHHHHGHHHHHHHHHHHHFFHHHHHHHFEHGHHHHGGGGGGGGHGGGGHGEFGGGGHGHHHHHHHHHHHHHHGGGGGHHGGGHGHHHHHHGHHGGGGGGGGGGGFFBFFFFBBBBB  AS:i:-5 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:109A124    YS:i:0  YT:Z:CP
M03972:51:000000000-BJVL8:1:1101:8198:11811 163 gi|11111113|ref|TL| 194 42  182M    =   392 432 ACTCGTCAGGATGTCCCGAACGCATGTCCGCCTGCCGCAGTATCGAGGCCTTCCGGGTGGGATGGGGCGCCTTGCAATATGAGGATAATGTCACCAATCCAGAGGATATGAGACCCTATTGCTGGCACTACCCACCAAGGCAGTGTGGCGTGGTCTCCGCGAAGACTGTGTGTGGCCCAGTG  BBBBAFBBAFFFGGGGGG2FGGGGGGHHHGGGGHHGGGGGGHHHGGECEFCGHHGGEECGGHHGAEFEEGGGFHHGHFFGHHHHHHGHHHDHGHHHGHHHHHFGGHHHHHHHGHGGHHHHEFEGHGHHHHHHHHGGEHFGHHG<GDDAHGGGGGHHHHD??CFAGGBFBBFFFFDGGEFE;B  AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:182    YS:i:-5 YT:Z:CP
next-gen Assembly alignment • 2.0k views
ADD COMMENT
1
Entering edit mode

Please refer to SAM format specification.

ADD REPLY
0
Entering edit mode

Hi, as I said, I have read that. Doesnt explain this to me.

ADD REPLY
4
Entering edit mode
5.8 years ago

Hello vellryba,

to understand how TLEN is calculated you have to find out the orientation of the reads. This can be done via the flags in column 2. There are lots of webtools out there which immediately translate the flag. One you can found here.

Now we know that the second read is the forward and the first read the reverse one. So the fragment will start at pos 194. To get the end position we take the start position of the reverse read and add the length of the read minus 1 (392 + 234 - 1). So the end position is 625.

TLEN is now the end pos - start pos + 1 (625 - 194 + 1) = 432

fin swimmer

ADD COMMENT
0
Entering edit mode

Dear Fin, thank you very much, this is most helpful. So the first (reverse) read spans positions 392 to 625 and the forward read spans 194 to 375. The 376 - 391 are just added from the reference, is that right?

Thank you! vell

ADD REPLY
2
Entering edit mode
5.8 years ago
ATpoint 82k

Add 234 (234M = 234 bases match from the start of the read) to the leftmost of the read, so 234 + 392 = 626.

From this subtract the 194, so 626 - 194 = 432 = TLEN.

As the first read aligned more 3' of the mate, it is an alignment to the minus strand, so 432 turns -432.

ADD COMMENT

Login before adding your answer.

Traffic: 2132 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6