Mapping and annotating DNA binding regions from ChIP-Seq to nearby gene
2
1
Entering edit mode
4.8 years ago

Hi: I have a BED file with sites of a transcription factor binding sites (mm9). I want to annotate these sites, typically located between intergenic regions, to a gene nearby. I checked a publication (PMC3080774) and it lists some tools that can map and annotate. I tried using Chipseqanno. Initially it was straightforward till first two commands (https://bioconductor.org/packages/release/bioc/vignettes/ChIPpeakAnno/inst/doc/pipeline.html) and it went confusing.

 bed <- system.file("extdata", "MACS_output.bed", package="ChIPpeakAnno")
## one can also try import from rtracklayer
gff <- system.file("extdata", "GFF_peaks.gff", package="ChIPpeakAnno")
gr2 <- toGRanges(gff, format="GFF", header=FALSE, skip=3)
## must keep the class exactly same as gr1$score, i.e., numeric. gr2$score <- as.numeric(gr2\$score)
ol <- findOverlapsOfPeaks(gr1, gr2)


Authors did not explain why they are overlapping gr1 and gr2. Before that authors say "## one can also try import from rtracklayer". also authors did not point out names for annotations MM8 or Hg19.

My question to this forum is: 1. Given a bed file how can I map and annotate the peak regions to nearby genes. 2. Can I define 1kb, 2kb range. (apparently chipseqanno uses granges and thus is possible- however lack of mm9 and proper reference to why findOverlaps between gr1 and gr2 makes it so confusing to deal with ranges). 3. Are there any tools other than chipseqanno to achieve above two tasks.

ChIP-Seq annotation • 1.7k views
0
Entering edit mode

Appreciate your suggestions. I will work on both. Thanks!

0
Entering edit mode

Okay, hope that it goes well. If your data is in the right format, then toralmanvar's suggestion may be easier to use. You had mentioned the use of Chipseqanno in your question, though.

2
Entering edit mode
4.8 years ago

You can just use GenomicRanges with your peaks BED file and your annotation GFF (after they have both been converted to GenomicRanges objects. There is a parameter in the GenomicRanges findOverlaps() function called maxgap, which would allow you to set your distances of 1kb, 2kb, etc.

With regard to why one would annotate regions that are up to 1kb, etc., away, well, that depends on your marker of study and how it is known to behave and affect gene transcription.

Kevin

1
Entering edit mode
4.8 years ago
Tm ★ 1.1k

You can try using annotatePeaks.pl which is a simple perl script, a part of HOMER package. You basically need 3 files for running it:

1. Peak bed file
2. reference genome fasta file
3. reference genome annotation (gtf/gff) file

And command goes like this:

perl annotatePeaks.pl sample_peak.bed Genome.fasta -gft genome.gft >homer_annotation.txt