Question

Analyzing Batman Methylation Data In R

1

Entering edit mode

11.0 years ago

Sirus ▴ 820

Hi guys, I am interested in dataset GSE37022 where they provide a GFF file that contains the methylation level for each samples accross the whole genome using a sliding window of 100bp. Basically the GFF file looks like this:

1       batman  meth    301     400     0.968000        .       .       ROI "CHR1"; batman.iqr 0.00000
1       batman  meth    401     500     0.968000        .       .       ROI "CHR1"; batman.iqr 0.00000
1       batman  meth    501     600     0.979957        .       .       ROI "CHR1"; batman.iqr 0.00243478
1       batman  meth    601     700     0.983667        .       .       ROI "CHR1"; batman.iqr 0.00143590
1       batman  meth    701     800     0.722364        .       .       ROI "CHR1"; batman.iqr 0.00327273

I am just what is the best way to calculate the methylation level in the promoter regions? are there any pre-existing packages ? I googled and tried to check in bioconductor but it seems there is any. if not, do any one know how to associate the promoter regions obtained using the TxDb.Hsapiens.UCSC.hg19.knownGene library to genes.

  > require(TxDb.Hsapiens.UCSC.hg19.knownGene)
  > hg19.known<-TxDb.Hsapiens.UCSC.hg19.knownGene;
  > prom<-promoters(hg19.known,200,200)
  > prom
GRanges with 80922 ranges and 2 metadata columns:
          seqnames               ranges strand   |     tx_id     tx_name
             <Rle>            <IRanges>  <Rle>   | <integer> <character>
      [1]     chr1     [ 11674,  12073]      +   |         1  uc001aaa.3
      [2]     chr1     [ 11674,  12073]      +   |         2  uc010nxq.1
      [3]     chr1     [ 11674,  12073]      +   |         3  uc010nxr.1
      [4]     chr1     [ 68891,  69290]      +   |         4  uc001aal.1
      [5]     chr1     [320884, 321283]      +   |         5  uc001aaq.2
      [6]     chr1     [320946, 321345]      +   |         6  uc001aar.2
      [7]     chr1     [321837, 322236]      +   |         7  uc009vjk.2
      [8]     chr1     [323692, 324091]      +   |         8  uc001aau.3
      [9]     chr1     [324088, 324487]      +   |         9  uc021oeh.1
      ...      ...                  ...    ... ...       ...         ...
  [80914]     chrY [27330721, 27331120]      -   |     76942  uc004fwt.3
  [80915]     chrY [27539596, 27539995]      -   |     76943  uc022cpa.1
  [80916]     chrY [27603300, 27603699]      -   |     76944  uc022cpb.1
  [80917]     chrY [27604180, 27604579]      -   |     76945  uc011nbx.1
  [80918]     chrY [27605479, 27605878]      -   |     76946  uc004fwx.1
  [80919]     chrY [27606222, 27606621]      -   |     76947  uc022cpc.1
  [80920]     chrY [27607233, 27607632]      -   |     76948  uc004fwz.3
  [80921]     chrY [27635755, 27636154]      -   |     76949  uc022cpd.1
  [80922]     chrY [59360655, 59361054]      -   |     76950  uc011ncc.1
  ---
  seqlengths:
                    chr1                  chr2                  chr3                  chr4 ...        chrUn_gl000247        chrUn_gl000248        chrUn_gl000249
               249250621             243199373             198022430             191154276 ...                 36422                 39786                 38502

Thanks in advance,

methylation r • 3.2k views

ADD COMMENT • link updated 11.0 years ago by David ▴ 740 • written 11.0 years ago by Sirus ▴ 820

score 1 · Answer 1 · 2013-04-19

1

Entering edit mode

11.0 years ago

David ▴ 740

Associating a promoter region (or genomic range) to a gene is regularly done when dealing with ChIP-seq data. Your prom seems to be a RangeData object from iRange.

## expected output:
> class(prom)
# [1] "RangedData"
# attr(,"package")
# [1] "IRanges"

If so you can plug it in the ChIPpeakAnno package and obtain an association to gene data.frame

library(ChIPpeakAnno)
library(biomaRt)
library(org.Mm.eg.db)

## For mouse annotations. look into the vignette('ChIPpeakAnno') for human and other org
## This is MM9 referecne genome
# data(TSS.mouse.NCBIM37)

## Building the MM10 annotations
library(biomaRt)
# ensembl=useMart("ensembl")
# listDatasets(ensembl)
ensembl = useMart("ensembl", dataset="mmusculus_gene_ensembl")
# listFilters(ensembl)
# listAttributes(ensembl)
TSS.mouse.GRCm38 <- getAnnotation(ensembl, featureType = "TSS")

annotatedPromoters <- annotatePeakInBatch(prom, AnnotationData=TSS.mouse.GRCm38, output="both")

## look at the results
head(annotatedPromoters)

ADD COMMENT • link 11.0 years ago by David ▴ 740

0

Entering edit mode

Hi David, thanks for your answer. So you mean first take the gff file generated by batman and make an intersection with the promoters region then annotate the result. is it?

ADD REPLY • link 11.0 years ago by Sirus ▴ 820

0

Entering edit mode

Yes if you want to interpret the result of batman through promoter region annotations. Intersect is the principle I had in mind. The implementation could be more complex as maybe methylation region could span over more than one promoter.

ADD REPLY • link 11.0 years ago by David ▴ 740

0

Entering edit mode

Hi david I ended up doing it with with BEDops tool, as the annotatePeakInBatch was slow, the files are a bit big 1.3G each

ADD REPLY • link 11.0 years ago by Sirus ▴ 820