Question: Analyzing Batman Methylation Data In R
1
gravatar for Sirus
7.5 years ago by
Sirus790
Boston/USA
Sirus790 wrote:

Hi guys, I am interested in dataset GSE37022 where they provide a GFF file that contains the methylation level for each samples accross the whole genome using a sliding window of 100bp. Basically the GFF file looks like this:

1       batman  meth    301     400     0.968000        .       .       ROI "CHR1"; batman.iqr 0.00000
1       batman  meth    401     500     0.968000        .       .       ROI "CHR1"; batman.iqr 0.00000
1       batman  meth    501     600     0.979957        .       .       ROI "CHR1"; batman.iqr 0.00243478
1       batman  meth    601     700     0.983667        .       .       ROI "CHR1"; batman.iqr 0.00143590
1       batman  meth    701     800     0.722364        .       .       ROI "CHR1"; batman.iqr 0.00327273

I am just what is the best way to calculate the methylation level in the promoter regions? are there any pre-existing packages ? I googled and tried to check in bioconductor but it seems there is any. if not, do any one know how to associate the promoter regions obtained using the TxDb.Hsapiens.UCSC.hg19.knownGene library to genes.

  > require(TxDb.Hsapiens.UCSC.hg19.knownGene)
  > hg19.known<-TxDb.Hsapiens.UCSC.hg19.knownGene;
  > prom<-promoters(hg19.known,200,200)
  > prom
GRanges with 80922 ranges and 2 metadata columns:
          seqnames               ranges strand   |     tx_id     tx_name
             <Rle>            <IRanges>  <Rle>   | <integer> <character>
      [1]     chr1     [ 11674,  12073]      +   |         1  uc001aaa.3
      [2]     chr1     [ 11674,  12073]      +   |         2  uc010nxq.1
      [3]     chr1     [ 11674,  12073]      +   |         3  uc010nxr.1
      [4]     chr1     [ 68891,  69290]      +   |         4  uc001aal.1
      [5]     chr1     [320884, 321283]      +   |         5  uc001aaq.2
      [6]     chr1     [320946, 321345]      +   |         6  uc001aar.2
      [7]     chr1     [321837, 322236]      +   |         7  uc009vjk.2
      [8]     chr1     [323692, 324091]      +   |         8  uc001aau.3
      [9]     chr1     [324088, 324487]      +   |         9  uc021oeh.1
      ...      ...                  ...    ... ...       ...         ...
  [80914]     chrY [27330721, 27331120]      -   |     76942  uc004fwt.3
  [80915]     chrY [27539596, 27539995]      -   |     76943  uc022cpa.1
  [80916]     chrY [27603300, 27603699]      -   |     76944  uc022cpb.1
  [80917]     chrY [27604180, 27604579]      -   |     76945  uc011nbx.1
  [80918]     chrY [27605479, 27605878]      -   |     76946  uc004fwx.1
  [80919]     chrY [27606222, 27606621]      -   |     76947  uc022cpc.1
  [80920]     chrY [27607233, 27607632]      -   |     76948  uc004fwz.3
  [80921]     chrY [27635755, 27636154]      -   |     76949  uc022cpd.1
  [80922]     chrY [59360655, 59361054]      -   |     76950  uc011ncc.1
  ---
  seqlengths:
                    chr1                  chr2                  chr3                  chr4 ...        chrUn_gl000247        chrUn_gl000248        chrUn_gl000249
               249250621             243199373             198022430             191154276 ...                 36422                 39786                 38502

Thanks in advance,

R methylation • 2.4k views
ADD COMMENTlink modified 7.5 years ago by David740 • written 7.5 years ago by Sirus790
1
gravatar for David
7.5 years ago by
David740
David740 wrote:

Associating a promoter region (or genomic range) to a gene is regularly done when dealing with ChIP-seq data. Your prom seems to be a RangeData object from iRange.

## expected output:
> class(prom)
# [1] "RangedData"
# attr(,"package")
# [1] "IRanges"

If so you can plug it in the ChIPpeakAnno package and obtain an association to gene data.frame

library(ChIPpeakAnno)
library(biomaRt)
library(org.Mm.eg.db)

## For mouse annotations. look into the vignette('ChIPpeakAnno') for human and other org
## This is MM9 referecne genome
# data(TSS.mouse.NCBIM37)

## Building the MM10 annotations
library(biomaRt)
# ensembl=useMart("ensembl")
# listDatasets(ensembl)
ensembl = useMart("ensembl", dataset="mmusculus_gene_ensembl")
# listFilters(ensembl)
# listAttributes(ensembl)
TSS.mouse.GRCm38 <- getAnnotation(ensembl, featureType = "TSS")

annotatedPromoters <- annotatePeakInBatch(prom, AnnotationData=TSS.mouse.GRCm38, output="both")

## look at the results
head(annotatedPromoters)
ADD COMMENTlink written 7.5 years ago by David740

Hi David, thanks for your answer. So you mean first take the gff file generated by batman and make an intersection with the promoters region then annotate the result. is it?

ADD REPLYlink written 7.5 years ago by Sirus790

Yes if you want to interpret the result of batman through promoter region annotations. Intersect is the principle I had in mind. The implementation could be more complex as maybe methylation region could span over more than one promoter.

ADD REPLYlink written 7.5 years ago by David740

Hi david I ended up doing it with with BEDops tool, as the annotatePeakInBatch was slow, the files are a bit big 1.3G each

ADD REPLYlink written 7.5 years ago by Sirus790
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1289 users visited in the last hour