I have some RNA-seq reads in indexed BAM format, which I am converting to BED with bamToBed -split
to understand how skipped regions within the read are handled.
Here is one sample read for some human data, which (according to the CIGAR string 38M88N9M
) contains a skip of 88 bases:
RNAME : chr21
POS - 1 : 9825622
POS - 1 + length(CIGAR) : 9825757
QNAME : HWI-ST700693:268:C1LYCACXX:7:1215:16268:22176
FLAG : 83
16 & FLAG : -
MAPQ : 255
CIGAR : 38M88N9M
RNEXT : =
PNEXT : 9825546
TLEN : -212
SEQ : CCGTTGGTGGCCCCGCCTGGNACCGAACCCGGCCCCGCGTGGGGCCC
QUAL : @B?@88??6&00&6@==;5,#(69@;::6A766:)81@?D@:DD=1<
TAGS : NM:i:2 XS:A:- NH:i:1
When converted to BED with bamToBed -split -bed12
, I get the following end result:
chr21 9825545 9825577 HWI-ST700693:268:C1LYCACXX:7:1215:16268:22176/2 255 + . . .
chr21 9825622 9825757 HWI-ST700693:268:C1LYCACXX:7:1215:16268:22176/1 255 - . . .
Given the PNEXT of 9825546
, I understand why the second read HWI-.../2
starts at 9825545
.
What I don't understand is why this second read stops at 9825577
.
Given the 38M
and 9M
matches, shouldn't it be 9825546 - 1 + 38 + 9
— or 9825592
?
What I am missing about how this second read's coordinates are generated?
Other sample reads I have been looking at seem to follow the rule that the M
elements of the CIGAR string are summed, but this example and a few others do not.
I looked at the source code for bamToBed
, but I seem to be confused about how the BamTools::getBamBlocks
function operates with -split
, in that the N
operand in the CIGAR string appears to break a single BAM read into two separate BAM blocks, after which the blocks should be printed separately.
Instead, the first read HWI-.../1
is printed as a contiguous region across the 88N
skipped region (shouldn't this be two separate blocks of length 38 and 9?). It is also unclear why the second read has the stop value that it has.
Sorry if these are dumb questions and thanks for your knowledge.
Okay, thanks. I think my confusion is about how blocks are presented in BED12 format, as opposed to regular BED.