R code to calculate total number of bp from gene range
1
0
Entering edit mode
9.1 years ago
M K ▴ 660

I used GenomicRanges function in R and I got the gene range that I am looking for it. The next step is to calculate the total number of base pairs in this specific range. I used coverage function and It gave me long results. I hope if any one can help me with loop function to get the total number of bp in this range.

R • 7.1k views
ADD COMMENT
3
Entering edit mode
9.1 years ago

What form is the result in? If it's a GRanges object, then width() will give you the number of bases per entry. If you happen to have a GRanges object containing the exons from a gene, then sum(width(reduce(gr))) on the GRanges object named gr will give you the total length without double counting anything (the takes care of multiple transcripts or any other sort of overlapping regions).

There are a number of other variations on the above theme that one could think of. If your usage case doesn't match one of those listed above, then just provide more details and I or someone else should be able to easily provide a solution.

ADD COMMENT
0
Entering edit mode

I used the simple example in "An Introduction to Genomic Ranges Classes" I this example, but I didn't understand the results of the coverage function as shown below:

> gr <-GRanges(seqnames =
+ Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
+ ranges =IRanges(1:10, end = 7:16, ),
+ strand =Rle(strand(c("-", "+", "*", "+", "-")),c(1, 2, 2, 3, 2)))
> gr
GRanges object with 10 ranges and 0 metadata columns:
       seqnames    ranges strand
          <Rle> <IRanges>  <Rle>
   [1]     chr1  [ 1,  7]      -
   [2]     chr2  [ 2,  8]      +
   [3]     chr2  [ 3,  9]      +
   [4]     chr2  [ 4, 10]      *
   [5]     chr1  [ 5, 11]      *
   [6]     chr1  [ 6, 12]      +
   [7]     chr3  [ 7, 13]      +
   [8]     chr3  [ 8, 14]      +
   [9]     chr3  [ 9, 15]      -
  [10]     chr3  [10, 16]      -
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
> co= coverage(gr)

This is the part I don't understand:

> co
RleList of length 3
$chr1
integer-Rle of length 12 with 5 runs
  Lengths: 4 1 2 4 1
  Values : 1 2 3 2 1

$chr2
integer-Rle of length 10 with 6 runs
  Lengths: 1 1 1 5 1 1
  Values : 0 1 2 3 2 1

$chr3
integer-Rle of length 16 with 8 runs
  Lengths: 6 1 1 1 4 1 1 1
  Values : 0 1 2 3 4 3 2 1

I mean what is the interpretation of

Lengths: 4 1 2 4 1
Values : 1 2 3 2 1
ADD REPLY
1
Entering edit mode

Ah, the question is essentially, "What the heck is 'run length encoding' anyway?", since that's what's output. Let's just take chr1 as an example. It has 3 entries and covers 12 bases. We can just draw the entries as follows:

-------
    -------
     -------

The left-most position is 1 and the right-most 12. So there are 4 bases (1-4) covered by only the first entry. Another way to state this is that you have a run of length 4 (4 bases) with value (coverage) 1. The 5th base above is covered by the 1st and second entries. This creates a run of length 1 (1 base) with value (coverage) 2. The 6th through 7th bases are covered by all three entries, so you have a run of length 2 and value 3. Entry 1 has now ended and bases 8 through 11 are covered by entries 2 and 3, creating a run of length 4 and value 2. The last base, 12, is only covered by entry 3, so it's a run of length 1 and value 1.

In short, if you were to graph the number of reads covering each position, a run length encoding provides a compact way of storing the coverage and a convenient way to then graph it. "Lengths" would then increment position on the X axis and the "Values" would state where to draw things on the Y axis.

BTW, if you instead wanted to know the total bases per chromosome covered by at least one entry, then sapply(split(gr, seqnames(gr)), function(x) sum(width(reduce(x, ignore.strand=T)))) will work. I'll leave unpacking exactly what that does as a lesson for you :)

ADD REPLY
0
Entering edit mode

Thanks a lot Devon for your help.

ADD REPLY

Login before adding your answer.

Traffic: 2580 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6