Question

How do I extract multiple mutation that corresponds to a particular position in the refference but are also located on a single read?

0

Entering edit mode

5.8 years ago

vellryba • 0

Hello, I am new to this and I have a couple of questions.

Firstly, is there a way to extract the particular location of mutation in the context of the reference (where the position 3 of the mutation also refers to a position 3 within a reference and not to the position 3 from the sequence alignment)? I am interested to see if for example one particular mutation is only present if other mutation is there so I need to be able to see that they came from the same read.

Secondly, I am not sure about something in the sam file. Here for example:

gi|11111111|ref|TT|_152_620_1:0:0_3:0:0_0   163 gi|11111113|ref|TL| 152 42  70M =   551 469

The 152 refers to the left most read. The 551 refers to the left most read of the paired sequence right? And the 469 is the length of the read? But why can I only see the 70 bp (70M)? That gives me only 140 bp. Thats strange. Someone else created a SAM file with those sequences previously and they had longer reads, not just 70M.

Am I forgetting something?

Thank you very much for your help!

Loki

next-gen sequence snp • 1.0k views

ADD COMMENT • link 5.8 years ago by vellryba • 0

0

Entering edit mode

Secondly, I am not sure about something in the sam file.

this is not a SAM record. It's been generated by a random program, We cannot help you here unless you provide more information

And the 469 is the length of the read?

could be the distance between the read ans the mate.

ADD REPLY • link 5.8 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

This is the full line in the SAM:

gi|11111111|ref|TT|_152_620_1:0:0_3:0:0_0 83 gi|11111113|ref|TL| 551 42 70M = 152 -469 GCGCACCACCCTGCCGTACTAGAGATGACGTAAACGCCAGCACGGACCTGTTGTGCCCCACGGACTGTTT 2222222222222222222222222222222222222222222222222222222222222222222222 AS:i:-9 XN:i:0 XM:i:3 XO:i:0 XG:i:0 NM:i:3 MD:Z:24C4T1C38 YS:i:-3 YT:Z:CPr code here

Is it ok?

What is the read mate? Are they not suppose to overlap?

This is what the description of the TLEN was: Template Length. Only applicable for paired-end sequencing data, TLEN is the size of the original DNA or RNA fragment, determined by examining both of the paired-mates, and counting bases from the left-most aligned base to the right-most aligned base. A value of 0 indicates that TLEN information is not available.

ADD REPLY • link 5.8 years ago by vellryba • 0

0

Entering edit mode

What is the read mate?

https://www.illumina.com/science/technology/next-generation-sequencing/paired-end-vs-single-read-sequencing.html

ADD REPLY • link 5.8 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thank you, I have pair end reads. But what do you mean by the distance between the read and the mate? That is what I dont get.

Can you help with my questions please? Are you happy that this is the SAM file? I just ended up more confused at the moment. According to TLEN and when I subtract column 4 and 8, the fragments should be 317 bp no? Why is it only 70?

ADD REPLY • link 5.8 years ago by vellryba • 0