Question: H3K4me3 occupancy visualisation
gravatar for Constantine
5.0 years ago by
Constantine280 wrote:


I have an issue when it comes to visualising NGS data (ChIP-Seq). The image below represents what is currently known about occupancy of H3K4me3. 


I've downloaded many different data of H3K4me3 ChIP-Seq from the EncodeProject and when I try to visualize H3K4me3 occupancy I get something like this:

Initially, I used the already uploaded BigWig and Bed files (both from mouse and human) to generate the plots. As this didn't give me the expected result, I downloaded the fastq files, aligned them to different tools and used different peakcallers. In the end, I was getting the same plot. 

To plot occupancy I've used either ngsplot or deeptools. 

Am I missing something here? I've used more than 20 datasets and I end up with the same graph

Thank you  in advance


chip-seq • 1.9k views
ADD COMMENTlink modified 12 months ago by Biostar ♦♦ 20 • written 5.0 years ago by Constantine280
gravatar for TriS
5.0 years ago by
United States, Buffalo
TriS4.2k wrote:

have you tried using the same files and same tools used to create the first image that you posted? 

you can create the image on the right using ChIPSeeker 

I also used a tweak by using the tagmatrix from ChIPSeeker and plotting the number of tags to get something similar to your graph on the left. here's the code, it starts from a bed-formatted file. I used it with ENCODE data and it works as expected.


library(ChIPseeker); library(TxDb.Hsapiens.UCSC.hg19.knownGene)
# get TSS info
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
promoter <- getPromoters(TxDb=txdb, upstream=2000, downstream=2000)
#read data
data<- readPeakFile(peakfile = "data.narrowPeak")
# get tag matrix
tagMatrix <- getTagMatrix(data, windows=promoter)
# calculate how many tags you have at each position
tags <- c()
for(i in 1:dim(tagMatrix)[2]){
  tags <- c(tags, sum(tagMatrix[,i]))

# plot it
plot(tags, type="n", ylim=c(0, 120), frame.plot=F, xlab="", ylab="Binding",xaxt="n")
axis(1, at=seq(0, 4000, 1000),labels=c("","","","",""),col.axis="black", las=2, cex.axis=1.2, tck=-.01)

mtext(side=1,text=c("-2kb", "-1kb", "TSS", "+1kb", "+2kb"), at=seq(0, 4000, 1000), line=0.4)
mtext(side=1,text=("Distance to TSS"), at=2000, line=2, cex=1.4)
lines(tags, col="blue", lwd=5)
ADD COMMENTlink written 5.0 years ago by TriS4.2k

I am writing because I am confused as to the definition of output of these two functions: getTagMatrix and plotAvgProf from the Chipseeker, it does not show the 'read count frequency' as the vignette of their output graphs is labelled. The label is a bit misleading. In the plot you generate with ChipSeeker we do not even input any information about the number of reads. So to me the plot generated is the frequency or number of the genomic locations in the bed file (eg, peaks) around the TSS.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by molla.linda80
gravatar for Fidel
5.0 years ago by
Fidel1.9k wrote:

I agree with Pierre, the image that you showed as an example is probably highlighting the -1 and +1 nucleosomes. To get such picture you need to a sharp and narrow positioning of the reads.

With deepTools you can center the reads over the expected fragment length in order to get sharper peaks (using either bamCoverage or bamCompare). Also, in the image that you showed there is a peak between the TSS and the TES. To get the image that you want you need to use the referencePoint option around TSS.



ADD COMMENTlink written 5.0 years ago by Fidel1.9k
gravatar for Alternative
5.0 years ago by
Alternative240 wrote:

This has nothing to do with the tool that you are using. This is biological. Are you taking into account the directionality of your peaks/TSS when you are doing your plots (+/- strands)? What you see in the K4me3 plot up is that the signal on +1 nucleosome is higher than the -1 nucleosome. If you are not taking the directionality into account, both +1 and -1 will cancel/add to each other and the profile looks like the one you are having.

Also, notice the small shoulder in your plot at the 5' and 3' of your peaks.

Hope this will help

ADD COMMENTlink written 5.0 years ago by Alternative240
gravatar for Chris Fields
5.0 years ago by
Chris Fields2.1k
University of Illinois Urbana-Champaign
Chris Fields2.1k wrote:

The output from the first file looks quite a bit like metaseq.  As TriS mentioned, might be worth trying to replicate the output using that toolkit.

ADD COMMENTlink written 5.0 years ago by Chris Fields2.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1579 users visited in the last hour