Question: Mapping and annotating DNA binding regions from ChIP-Seq to nearby gene
gravatar for oriolebaltimore
28 days ago by
United States
oriolebaltimore60 wrote:

Hi: I have a BED file with sites of a transcription factor binding sites (mm9). I want to annotate these sites, typically located between intergenic regions, to a gene nearby. I checked a publication (PMC3080774) and it lists some tools that can map and annotate. I tried using Chipseqanno. Initially it was straightforward till first two commands ( and it went confusing.

 bed <- system.file("extdata", "MACS_output.bed", package="ChIPpeakAnno")
    gr1 <- toGRanges(bed, format="BED", header=FALSE) 
    ## one can also try import from rtracklayer
    gff <- system.file("extdata", "GFF_peaks.gff", package="ChIPpeakAnno")
    gr2 <- toGRanges(gff, format="GFF", header=FALSE, skip=3)
    ## must keep the class exactly same as gr1$score, i.e., numeric.
   gr2$score <- as.numeric(gr2$score) 
   ol <- findOverlapsOfPeaks(gr1, gr2)
    ## add metadata (mean of score) to the overlapping peaks

Authors did not explain why they are overlapping gr1 and gr2. Before that authors say "## one can also try import from rtracklayer". also authors did not point out names for annotations MM8 or Hg19.

My question to this forum is: 1. Given a bed file how can I map and annotate the peak regions to nearby genes. 2. Can I define 1kb, 2kb range. (apparently chipseqanno uses granges and thus is possible- however lack of mm9 and proper reference to why findOverlaps between gr1 and gr2 makes it so confusing to deal with ranges). 3. Are there any tools other than chipseqanno to achieve above two tasks.

Thanks Adrian

chip-seq annotation • 164 views
ADD COMMENTlink modified 25 days ago • written 28 days ago by oriolebaltimore60

Appreciate your suggestions. I will work on both. Thanks!

ADD REPLYlink written 25 days ago by oriolebaltimore60

Moved your answer to a comment at the top level.

Okay, hope that it goes well. If your data is in the right format, then toralmanvar's suggestion may be easier to use. You had mentioned the use of Chipseqanno in your question, though.

ADD REPLYlink written 25 days ago by Kevin Blighe24k
gravatar for Kevin Blighe
27 days ago by
Kevin Blighe24k
Republic of Ireland
Kevin Blighe24k wrote:

You can just use GenomicRanges with your peaks BED file and your annotation GFF (after they have both been converted to GenomicRanges objects. There is a parameter in the GenomicRanges findOverlaps() function called maxgap, which would allow you to set your distances of 1kb, 2kb, etc.

With regard to why one would annotate regions that are up to 1kb, etc., away, well, that depends on your marker of study and how it is known to behave and affect gene transcription.


ADD COMMENTlink modified 27 days ago • written 27 days ago by Kevin Blighe24k
gravatar for toralmanvar
27 days ago by
toralmanvar420 wrote:

You can try using which is a simple perl script, a part of HOMER package. You basically need 3 files for running it:

  1. Peak bed file
  2. reference genome fasta file
  3. reference genome annotation (gtf/gff) file

And command goes like this:

perl sample_peak.bed Genome.fasta -gft genome.gft >homer_annotation.txt

ADD COMMENTlink written 27 days ago by toralmanvar420
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1258 users visited in the last hour