Question: Mapping and annotating DNA binding regions from ChIP-Seq to nearby gene
gravatar for oriolebaltimore
12 months ago by
United States
oriolebaltimore60 wrote:

Hi: I have a BED file with sites of a transcription factor binding sites (mm9). I want to annotate these sites, typically located between intergenic regions, to a gene nearby. I checked a publication (PMC3080774) and it lists some tools that can map and annotate. I tried using Chipseqanno. Initially it was straightforward till first two commands ( and it went confusing.

 bed <- system.file("extdata", "MACS_output.bed", package="ChIPpeakAnno")
    gr1 <- toGRanges(bed, format="BED", header=FALSE) 
    ## one can also try import from rtracklayer
    gff <- system.file("extdata", "GFF_peaks.gff", package="ChIPpeakAnno")
    gr2 <- toGRanges(gff, format="GFF", header=FALSE, skip=3)
    ## must keep the class exactly same as gr1$score, i.e., numeric.
   gr2$score <- as.numeric(gr2$score) 
   ol <- findOverlapsOfPeaks(gr1, gr2)
    ## add metadata (mean of score) to the overlapping peaks

Authors did not explain why they are overlapping gr1 and gr2. Before that authors say "## one can also try import from rtracklayer". also authors did not point out names for annotations MM8 or Hg19.

My question to this forum is: 1. Given a bed file how can I map and annotate the peak regions to nearby genes. 2. Can I define 1kb, 2kb range. (apparently chipseqanno uses granges and thus is possible- however lack of mm9 and proper reference to why findOverlaps between gr1 and gr2 makes it so confusing to deal with ranges). 3. Are there any tools other than chipseqanno to achieve above two tasks.

Thanks Adrian

chip-seq annotation • 552 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by oriolebaltimore60

Appreciate your suggestions. I will work on both. Thanks!

ADD REPLYlink written 12 months ago by oriolebaltimore60

Moved your answer to a comment at the top level.

Okay, hope that it goes well. If your data is in the right format, then toralmanvar's suggestion may be easier to use. You had mentioned the use of Chipseqanno in your question, though.

ADD REPLYlink written 12 months ago by Kevin Blighe44k
gravatar for Kevin Blighe
12 months ago by
Kevin Blighe44k
South America | Europe | USA
Kevin Blighe44k wrote:

You can just use GenomicRanges with your peaks BED file and your annotation GFF (after they have both been converted to GenomicRanges objects. There is a parameter in the GenomicRanges findOverlaps() function called maxgap, which would allow you to set your distances of 1kb, 2kb, etc.

With regard to why one would annotate regions that are up to 1kb, etc., away, well, that depends on your marker of study and how it is known to behave and affect gene transcription.


ADD COMMENTlink modified 12 months ago • written 12 months ago by Kevin Blighe44k
gravatar for toralmanvar
12 months ago by
toralmanvar770 wrote:

You can try using which is a simple perl script, a part of HOMER package. You basically need 3 files for running it:

  1. Peak bed file
  2. reference genome fasta file
  3. reference genome annotation (gtf/gff) file

And command goes like this:

perl sample_peak.bed Genome.fasta -gft genome.gft >homer_annotation.txt

ADD COMMENTlink written 12 months ago by toralmanvar770
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 983 users visited in the last hour