What does minoverlap = 1L means?
1
1
Entering edit mode
3.9 years ago

Hi everyone.

I was using this wonderful package for annotation of my BED files using R. The package is annotatr. In this package, the annotate_regions() function has an option to specify the minimum overlap you would like to consider to assign that annotation to the BED coordinates. However, they haven't mentioned any range as if it is from 0 to 1.

Also, what does 1L represent in minoverlap = 1L. What should be the value to allow 50% of overlap?

Anyone having experience with Granges could help

gene R annotatr • 1.8k views
ADD COMMENT
4
Entering edit mode
3.9 years ago
ATpoint 82k

The L notation explicitely makes integers:

> class(1000)
[1] "numeric"
> class(1000L)
[1] "integer"

Apparently this can have advantages for speed and memory, but this you probably only notice if you work with super large datasets, see https://stackoverflow.com/questions/7014387/whats-the-difference-between-1l-and-1

There is I think not really a function for this. Here is a custom approach, not sure how well this scales with large datasets:

Output:

> ranges1
GRanges object with 3 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1      1-10      *
  [2]     chr2   100-150      *
  [3]     chr3 1000-5000      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
> 
> ranges2
GRanges object with 3 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1     11-20      *
  [2]     chr2   110-130      *
  [3]     chr3 2000-3000      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
> 
> GetPercentOverlap(query = ranges1, subject = ranges2)
GRanges object with 2 ranges and 2 metadata columns:
      seqnames    ranges strand | percentOverlap   subject
         <Rle> <IRanges>  <Rle> |      <numeric> <IRanges>
  [1]     chr2   100-150      * |        41.1765   110-130
  [2]     chr3 1000-5000      * |        25.0187 2000-3000
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> 
> GetPercentOverlap(query = ranges2, subject = ranges1)
GRanges object with 2 ranges and 2 metadata columns:
      seqnames    ranges strand | percentOverlap   subject
         <Rle> <IRanges>  <Rle> |      <numeric> <IRanges>
  [1]     chr2   110-130      * |            100   100-150
  [2]     chr3 2000-3000      * |            100 1000-5000
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
ADD COMMENT
0
Entering edit mode

Thanks @ATpoint.

For the minOverlap takes the number of bases you wish to consider as a valid overlap. So if you have regions of uniform length (say 1000bp) and you wish to consider the overlap with 500bp as significant overlap, one must set the value to 500L.It's your above-mentioned script that helped me reach this conclusion

ADD REPLY

Login before adding your answer.

Traffic: 2666 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6