Sam file arrangement
1
0
Entering edit mode
7 weeks ago
aenna_p • 0

Hello,

I have a question regarding the length information of reads obtained from BAM files. I have converted BAM files into BED files and kept the read sequence. So, it looks something like this:

Chr 6791    7891    TCGAATATCAGGGTGCCCTCTGGCAAGGGCTTGCCCAGCGTACGTCAC    -
Chr 6966    7304    ATTGATGAGGGATGTGGGTGGATGGATGATGATGGAAATATGATATGC    +

I always assumed that columns 2 and 3 provide information on the start and end positions of the read alignment. So, column3 - column2 is the read length. However, if I calculate the number of characters in the DNA string (column 4) with function nchar() in R, I get a different value.
Can anyone explain what I am missing?

Thank you!

BAM BED • 273 views
ADD COMMENT
1
Entering edit mode
7 weeks ago
ATpoint 54k

Alignment length != read length. Reads might got soft-clipped, and parts of the read might align elsewhere, depending on how te aligner handles clipping and non-primary alignments.

ADD COMMENT
0
Entering edit mode

Thank you! I do understand why read length may be larger than alignment length. But I still do not understand how sometimes the alignment length can be larger than the read length. Can you explain this further?

ADD REPLY
2
Entering edit mode
Alignment:  GATCGATCACTGACGTATCTAGGCGATCAGTCGTACGTATCACTA
Read:       GATCGATCACTGACGTATCTA  CGATCAGTCGTACGTATCACTA

Here a simple example of a deletion in the read compared to the reference that makes the alignment two bp larger than the read length, as start and end of the alignment define the coordinates.

ADD REPLY
0
Entering edit mode

That was very well simplified! Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1891 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6