Question: sam file help for beginner
0
gravatar for vellryba
4 months ago by
vellryba0
vellryba0 wrote:

Hello, I am new to bioinformatics and have some question about sam files. Here, the 392 refers to the left most position in reference genome where the read aligns. The 194 is the position where the second mate aligns. The TLEN is 432. How do you get to that number? I was trying to read up about this but I dont get it.

Thank you!

M03972:51:000000000-BJVL8:1:1101:8198:11811 83  gi|11111113|ref|TL| 392 42  234M    =   194 -432    GCCCAGTGGTAGTGGGCACGACCGACAGGCTTGGAGCGCCCACTTACACGTGGGGGGAGAATGAGACAGATGTCTTCCTATTGAACAGCACTCGACCACCGCTGGGGTCGTGGTTCGGCTGCACGTGGATGAACTCTTCTGGCTACACCAAGACTTGCGGCGCACCACCCTGCCGTACTAGAGCTGACTTCAACGCCAGCACGGACCTGTTGTGCCCCACGGACTGTTTTAGGA  E.FFFFEFB/FEBFFFD=CDBFFFBFFFFFFFFFFFFFFFFBFFFFEAFFAGGGGGGGGGGGGGGGGGGGGGGGEGGHHGHHHHHHHGHGGGGHGGGGGGHGGGGGGGGGHGGGGGHHHHHHHGHHHHHHHHHHHHFFHHHHHHHFEHGHHHHGGGGGGGGHGGGGHGEFGGGGHGHHHHHHHHHHHHHHGGGGGHHGGGHGHHHHHHGHHGGGGGGGGGGGFFBFFFFBBBBB  AS:i:-5 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:109A124    YS:i:0  YT:Z:CP
M03972:51:000000000-BJVL8:1:1101:8198:11811 163 gi|11111113|ref|TL| 194 42  182M    =   392 432 ACTCGTCAGGATGTCCCGAACGCATGTCCGCCTGCCGCAGTATCGAGGCCTTCCGGGTGGGATGGGGCGCCTTGCAATATGAGGATAATGTCACCAATCCAGAGGATATGAGACCCTATTGCTGGCACTACCCACCAAGGCAGTGTGGCGTGGTCTCCGCGAAGACTGTGTGTGGCCCAGTG  BBBBAFBBAFFFGGGGGG2FGGGGGGHHHGGGGHHGGGGGGHHHGGECEFCGHHGGEECGGHHGAEFEEGGGFHHGHFFGHHHHHHGHHHDHGHHHGHHHHHFGGHHHHHHHGHGGHHHHEFEGHGHHHHHHHHGGEHFGHHG<GDDAHGGGGGHHHHD??CFAGGBFBBFFFFDGGEFE;B  AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:182    YS:i:-5 YT:Z:CP
alignment next-gen assembly • 241 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by vellryba0
1

Please refer to SAM format specification.

ADD REPLYlink written 4 months ago by Sej Modha3.8k

Hi, as I said, I have read that. Doesnt explain this to me.

ADD REPLYlink written 4 months ago by vellryba0
3
gravatar for finswimmer
4 months ago by
finswimmer6.8k
Germany
finswimmer6.8k wrote:

Hello vellryba,

to understand how TLEN is calculated you have to find out the orientation of the reads. This can be done via the flags in column 2. There are lots of webtools out there which immediately translate the flag. One you can found here.

Now we know that the second read is the forward and the first read the reverse one. So the fragment will start at pos 194. To get the end position we take the start position of the reverse read and add the length of the read minus 1 (392 + 234 - 1). So the end position is 625.

TLEN is now the end pos - start pos + 1 (625 - 194 + 1) = 432

fin swimmer

ADD COMMENTlink written 4 months ago by finswimmer6.8k

Dear Fin, thank you very much, this is most helpful. So the first (reverse) read spans positions 392 to 625 and the forward read spans 194 to 375. The 376 - 391 are just added from the reference, is that right?

Thank you! vell

ADD REPLYlink written 4 months ago by vellryba0
2
gravatar for ATpoint
4 months ago by
ATpoint9.2k
Germany
ATpoint9.2k wrote:

Add 234 (234M = 234 bases match from the start of the read) to the leftmost of the read, so 234 + 392 = 626.

From this subtract the 194, so 626 - 194 = 432 = TLEN.

As the first read aligned more 3' of the mate, it is an alignment to the minus strand, so 432 turns -432.

ADD COMMENTlink written 4 months ago by ATpoint9.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2129 users visited in the last hour