Question: Mapping and annotating DNA binding regions from ChIP-Seq to nearby gene
gravatar for oriolebaltimore
4 months ago by
United States
oriolebaltimore60 wrote:

Hi: I have a BED file with sites of a transcription factor binding sites (mm9). I want to annotate these sites, typically located between intergenic regions, to a gene nearby. I checked a publication (PMC3080774) and it lists some tools that can map and annotate. I tried using Chipseqanno. Initially it was straightforward till first two commands ( and it went confusing.

 bed <- system.file("extdata", "MACS_output.bed", package="ChIPpeakAnno")
    gr1 <- toGRanges(bed, format="BED", header=FALSE) 
    ## one can also try import from rtracklayer
    gff <- system.file("extdata", "GFF_peaks.gff", package="ChIPpeakAnno")
    gr2 <- toGRanges(gff, format="GFF", header=FALSE, skip=3)
    ## must keep the class exactly same as gr1$score, i.e., numeric.
   gr2$score <- as.numeric(gr2$score) 
   ol <- findOverlapsOfPeaks(gr1, gr2)
    ## add metadata (mean of score) to the overlapping peaks

Authors did not explain why they are overlapping gr1 and gr2. Before that authors say "## one can also try import from rtracklayer". also authors did not point out names for annotations MM8 or Hg19.

My question to this forum is: 1. Given a bed file how can I map and annotate the peak regions to nearby genes. 2. Can I define 1kb, 2kb range. (apparently chipseqanno uses granges and thus is possible- however lack of mm9 and proper reference to why findOverlaps between gr1 and gr2 makes it so confusing to deal with ranges). 3. Are there any tools other than chipseqanno to achieve above two tasks.

Thanks Adrian

chip-seq annotation • 309 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by oriolebaltimore60

Appreciate your suggestions. I will work on both. Thanks!

ADD REPLYlink written 4 months ago by oriolebaltimore60

Moved your answer to a comment at the top level.

Okay, hope that it goes well. If your data is in the right format, then toralmanvar's suggestion may be easier to use. You had mentioned the use of Chipseqanno in your question, though.

ADD REPLYlink written 4 months ago by Kevin Blighe32k
gravatar for Kevin Blighe
4 months ago by
Kevin Blighe32k
Republic of Ireland
Kevin Blighe32k wrote:

You can just use GenomicRanges with your peaks BED file and your annotation GFF (after they have both been converted to GenomicRanges objects. There is a parameter in the GenomicRanges findOverlaps() function called maxgap, which would allow you to set your distances of 1kb, 2kb, etc.

With regard to why one would annotate regions that are up to 1kb, etc., away, well, that depends on your marker of study and how it is known to behave and affect gene transcription.


ADD COMMENTlink modified 4 months ago • written 4 months ago by Kevin Blighe32k
gravatar for toralmanvar
4 months ago by
toralmanvar650 wrote:

You can try using which is a simple perl script, a part of HOMER package. You basically need 3 files for running it:

  1. Peak bed file
  2. reference genome fasta file
  3. reference genome annotation (gtf/gff) file

And command goes like this:

perl sample_peak.bed Genome.fasta -gft genome.gft >homer_annotation.txt

ADD COMMENTlink written 4 months ago by toralmanvar650
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1669 users visited in the last hour