Entering edit mode
                    8.4 years ago
        Chirag Parsania
        
    
        ★
    
    2.0k
    Hi,
I have two dataframes (in R) A and B of Granges. feature mentioned in the B is within the feature of A.
> a
GRanges object with 2 ranges and 0 metadata columns:
                    seqnames           ranges strand
                       <Rle>        <IRanges>  <Rle>
  [1] ChrD_C_glabrata_CBS138 [451956, 454735]      +
  [2] ChrD_C_glabrata_CBS138 [451956, 454735]      +
  -------
  seqinfo: 14 sequences from an unspecified genome; no seqlengths
> b 
GRanges object with 2 ranges and 0 metadata columns:
                    seqnames           ranges strand
                       <Rle>        <IRanges>  <Rle>
  [1] ChrD_C_glabrata_CBS138 [452667, 454092]      +
  [2] ChrD_C_glabrata_CBS138 [452667, 454092]      +
  -------
  seqinfo: 14 sequences from an unspecified genome; no seqlengths
findOverlaps from GRanges package gives following output
> findOverlaps(a,b)
Hits object with 4 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  [2]         1           2
  [3]         2           1
  [4]         2           2
  -------
  queryLength: 2 / subjectLength: 2
I want subject hits only if query covered by >  90 %. I tried minoverlap argument of findOverplaps but no success. Or In other words, for a given query feature how to find what % of the query overlapped with subject hits ?
Expected output should not contain any subject hits.
> findOverlaps(a,b)
Hits object with 0 hits and 0 metadata columns:
   queryHits subjectHits
   <integer>   <integer>
  -------
  queryLength: 2 / subjectLength: 2
~Chirag.
You can use pintersect function from IRanges package.
http://svitsrv25.epfl.ch/R-doc/library/IRanges/html/IRanges-setops.html
One of the solution I found here
https://support.bioconductor.org/p/72656/