cannot understand coverage function in Granges package
1
1
Entering edit mode
6 weeks ago
alexmondaini ▴ 20

Hello everyone,

I understand the concept behind a run length encoded vector as being just a short representation of a long value repeated vector.

For example :

> rl = Rle(c(1,1,1,1,2,2,2,3,1,1))
> rl
numeric-Rle of length 10 with 4 runs
Lengths: 4 3 1 2
Values : 1 2 3 1


means for the 1 value I have a lenght of 4 for the value of 2 I have a length of 3 an so on.

Now if we construct a Granges object:

gr1 <- GRanges(seqnames="chr2", ranges=IRanges(c(3,10), c(6,16)),
strand="+")


that looks like that:

 gr1
GRanges object with 2 ranges and 0 metadata columns:
seqnames    ranges strand
<Rle> <IRanges>  <Rle>
[1]     chr2       3-6      +
[2]     chr2     10-16      +
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths


and we ask for the coverage of this object we get:

> coverage(gr1)
RleList of length 1
$chr2 integer-Rle of length 16 with 4 runs Lengths: 2 4 3 7 Values : 0 1 0 1  I honestly struggle to understand the values and lengths here, I have been reading the manual and other tutorials but haven't found a single place which in detail goes one by one on how coverage is computed and how the Rle finally gives these values and lengths. If someone could help me out explaining why the first length is 2 and the first value is 0 in this case would be great. thanks Granges • 228 views ADD COMMENT 2 Entering edit mode 6 weeks ago ATpoint 57k The coverage() output represents the coverage of the entire chromosome. Since your GRanges object starts at 3 means that the positions 1 and 2 have coverage of zero, therefore the Values 0 of Lengths 2. In your case the "chr2" is of length 16 because 16 is the largest end coordinate you entered. You can set a seqlength though. If you tell gr1 that e.g. chr2 is of length 1000 then this would also be represented in coverage: > seqlengths(gr1) <- 1000 > coverage(gr1) RleList of length 1$chr2
integer-Rle of length 1000 with 5 runs
Lengths:   2   4   3   7 984
Values :   0   1   0   1   0


16 is the last base that is coverered, so 1000-16=984, therefore Lengths 984 and Values 0 because all these bases after 16 are uncovered.

Does that make sense to you?

0
Entering edit mode

oh yes, now it does.This is an absolute coverage of the genome in question right? it definitely makes more sense when you give a seq length to every chromosome. I'm wondering now if it makes sense to use this function to get the coverage of one granges object relative to another granges object. I believe bestools coverage behave this way with several different bed files but here for granges it seems like coverage should be used to compare the Grange object with its entire length.