Question: overlapping GRanges and GAlignments objects - "within" type results depend on "minoverlap" parameter
1
gravatar for jpasulka
9 months ago by
jpasulka10
jpasulka10 wrote:

Hello biostar followers,

I use normally "findOverlaps" function in R to select reads of my interest from my BAM files (reading in R with library GenomicAlignments). This times I received a weird results :

> single_read.gal

GAlignments object with 1 alignment and 5 metadata columns:

  seqnames strand       cigar    qwidth     start       end     width     njunc |                                  qname                     seq        NH        HI        nM
     <Rle>  <Rle> <character> <integer> <integer> <integer> <integer> <integer> |                            <character>          <DNAStringSet> <integer> <integer> <integer>

[1]   chrY      -         51M        51  42301566  42301616        51         0 | HISEQ:125:CAWV3ACXX:6:1107:18815:33205 TCCAGTCCTG...TGTAAGCCAA         1         1         1
-------
seqinfo: 66 sequences from an unspecified genome

> genomic_region.gr

GRanges object with 1 range and 1 metadata column:

  seqnames            ranges strand |      GeneID
     <Rle>         <IRanges>  <Rle> | <character>
[1]   chrY 42301297-42301688      + |      MTA_Mm
-------
seqinfo: 36 sequences from an unspecified genome; no seqlengths

So, the read "single_read.gal" is clearly within the GRanges interval ( 42301297 < 42301566 & 42301616 < 42301688 ), opposite strand, overlapping width = 51 (width of the read).

But, when I run the "findOverlaps" function ignoring the strand info, there is no overlap fulfilling the input requirements :

> findOverlaps( query = single_read.gal, subject = genomic_region.gr, type = "within", minoverlap = 25, ignore.strand = T )

Hits object with 0 hits and 0 metadata columns:
queryHits subjectHits
<integer>   <integer>
-------
queryLength: 1 / subjectLength: 1

It works only with "minoverlap" parameter = 0

> findOverlaps( query = single_read.gal, subject = genomic_region.gr, type = "within", minoverlap = 0, ignore.strand = T )

Hits object with 1 hit and 0 metadata columns:
  queryHits subjectHits
  <integer>   <integer>
[1]         1           1
-------
queryLength: 1 / subjectLength: 1

For any-type overlap works even the former setup :

> findOverlaps( query = single_read.gal, subject = genomic_region.gr, type = "any", minoverlap = 25, ignore.strand = T )

Hits object with 1 hit and 0 metadata columns:
  queryHits subjectHits
  <integer>   <integer>
[1]         1           1
-------
queryLength: 1 / subjectLength: 1

Is that a bug ? Thanks in advance for any respond

> sessionInfo()

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8      
[8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] data.table_1.11.4           GenomicAlignments_1.16.0    Rsamtools_1.32.3            Biostrings_2.48.0           XVector_0.20.0              SummarizedExperiment_1.10.1 DelayedArray_0.6.5         
 [8] BiocParallel_1.14.2         matrixStats_0.54.0          Biobase_2.40.0              GenomicRanges_1.32.6        GenomeInfoDb_1.16.0         IRanges_2.14.11             S4Vectors_0.18.3           
[15] BiocGenerics_0.26.0         BiocInstaller_1.30.0       

loaded via a namespace (and not attached):
[1] lattice_0.20-35        bitops_1.0-6           grid_3.5.1             zlibbioc_1.26.0        Matrix_1.2-14          tools_3.5.1            RCurl_1.95-4.11        compiler_3.5.1        
[9] GenomeInfoDbData_1.1.0
findoverlaps R • 366 views
ADD COMMENTlink modified 9 months ago by benformatics890 • written 9 months ago by jpasulka10
3
gravatar for benformatics
9 months ago by
benformatics890
ETH Zurich
benformatics890 wrote:

Seems like a possible bug but also looks to be related to the classes used. If you convert the GAlignments object to GRanges, it works as expected.

## setting up your same objects
genomic_region.gr <- GRanges("chrY:42301297-42301688:+")
single_read.gal <- as(GRanges("chrY:42301566-42301616:-"),'GAlignments')

## test as is
findOverlaps( query = single_read.gal, subject = genomic_region.gr, type = "within", minoverlap = 25, ignore.strand = T )

Hits object with 0 hits and 0 metadata columns:
   queryHits subjectHits
   <integer>   <integer>
  -------
  queryLength: 1 / subjectLength: 1

## test after changing class
single_read.gal <- as(single_read.gal,'GRanges')

findOverlaps( query = single_read.gal, subject = genomic_region.gr, type = "within", minoverlap = 25, ignore.strand = T )

Hits object with 1 hit and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  -------
  queryLength: 1 / subjectLength: 1

You might get a better explanation in the Bioconductor forums.

ADD COMMENTlink modified 9 months ago • written 9 months ago by benformatics890
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 954 users visited in the last hour