Hello, I am new to this and I have a couple of questions.
Firstly, is there a way to extract the particular location of mutation in the context of the reference (where the position 3 of the mutation also refers to a position 3 within a reference and not to the position 3 from the sequence alignment)? I am interested to see if for example one particular mutation is only present if other mutation is there so I need to be able to see that they came from the same read.
Secondly, I am not sure about something in the sam file. Here for example:
gi|11111111|ref|TT|_152_620_1:0:0_3:0:0_0 163 gi|11111113|ref|TL| 152 42 70M = 551 469
The 152 refers to the left most read. The 551 refers to the left most read of the paired sequence right? And the 469 is the length of the read? But why can I only see the 70 bp (70M)? That gives me only 140 bp. Thats strange. Someone else created a SAM file with those sequences previously and they had longer reads, not just 70M.
Am I forgetting something?
Thank you very much for your help!
Loki
this is not a SAM record. It's been generated by a random program, We cannot help you here unless you provide more information
could be the distance between the read ans the mate.
This is the full line in the SAM:
Is it ok?
What is the read mate? Are they not suppose to overlap?
This is what the description of the TLEN was: Template Length. Only applicable for paired-end sequencing data, TLEN is the size of the original DNA or RNA fragment, determined by examining both of the paired-mates, and counting bases from the left-most aligned base to the right-most aligned base. A value of 0 indicates that TLEN information is not available.
https://www.illumina.com/science/technology/next-generation-sequencing/paired-end-vs-single-read-sequencing.html
Thank you, I have pair end reads. But what do you mean by the distance between the read and the mate? That is what I dont get.
Can you help with my questions please? Are you happy that this is the SAM file? I just ended up more confused at the moment. According to TLEN and when I subtract column 4 and 8, the fragments should be 317 bp no? Why is it only 70?