How to convert bedgraph file with bins into GRanges object?
1
0
Entering edit mode
2.4 years ago
Svetlana ▴ 10

Hi Everyone!

Hope you could give me some advice on the following problem.

I have a bedgraph file containing chromosome bins and log2 ratios Condition1 vs Condition2 of ChipSeq data. Bins are 50 bp, but windows may be different in size depending on log2:

enter image description here

This file is a result of bigwigCompare (Galaxy) used on hg18 wiggle files downloaded from Gene Expression Omnibus, so I don't have access to raw reads.

From here, I would like to overlap these bins with another file containing peaks and their locations, annotated with ChipPeakAnno R package (here I used EnsDb.Hsapiens.v75 because reads were aligned onto hg19). How can I overcome the difference in reference genome?

Any suggestions?

Thanks!

Svetlana

Bedgraph Bedtools • 1.2k views
ADD COMMENT
1
Entering edit mode
2.4 years ago
seidel 11k

You could convert your bedGraph bins from hg18 to hg19 using liftover, so you can overlap them with your peaks. You would read them into a GRanges object, then hand this to the liftover function to translate from hg18 to hg19, then unlist the results to get back a regular GRanges object. To do this, you have to have a liftover chain file from UCSC. You can get it via web, ftp, or wget:

# get chain file from UCSC
wget 'http://hgdownload.soe.ucsc.edu/goldenPath/hg18/liftOver/hg18ToHg19.over.chain.gz'

# uncompress it
gunzip hg18ToHg19.over.chain.gz

Then within R, depending on the format of your bedGaph data, you can convert it to GRanges. I'll assume you have a dataframe:

library(GenomicRanges)
library(rtracklayer)

# make some toy data
df <- data.frame(chr=rep("chr1", 4), start=seq(100,250,by=50), end=seq(150,300,by=50), score=rnorm(4, mean=7.5))

# convert it to GRanges
df_gr <- makeGRangesFromDataFrame(df, keep.extra.columns=TRUE)

# import the chain file
chain <- import.chain("hg18ToHg19.over.chain")

# convert from hg18 to hg19
df_hg19_gr <- liftOver(df_gr, chain)

results <- unlist(df_hg19_gr)

check the results:

> df_gr
GRanges object with 4 ranges and 1 metadata column:
      seqnames    ranges strand |     score
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chr1   100-150      * |   9.94368
  [2]     chr1   150-200      * |   8.93145
  [3]     chr1   200-250      * |   7.19089
  [4]     chr1   250-300      * |   5.83782
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> results
GRanges object with 4 ranges and 1 metadata column:
      seqnames      ranges strand |     score
         <Rle>   <IRanges>  <Rle> | <numeric>
  [1]     chr1 10100-10150      * |   9.94368
  [2]     chr1 10150-10200      * |   8.93145
  [3]     chr1 10200-10250      * |   7.19089
  [4]     chr1 10250-10300      * |   5.83782
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
ADD COMMENT
0
Entering edit mode

This is so helpful!! Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 2453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6