Question: Heatmap Of Read Coverage Around Tsss
8
gravatar for A. Domingues
5.7 years ago by
A. Domingues2.0k
Mainz, Germany
A. Domingues2.0k wrote:

I am trying to plot a heatmap of read density around a feature of interest (TSSs) very common in genomics papers. something like this (B):

enter image description here

However, I am struggling a bit in getting to look "right". A bit of background: I have mapped ChIP-seq reads for pol2 and calculate the coverage, per nucleotide, using bedtools.

coverageBed -d -abam $bamFile -b $TSSs > $coverage.bed
# output:
chr1    67108226    67110226    uc001dct.3    16    +    1    10
chr1    67108226    67110226    uc001dct.3    16    +    2    10
chr1    67108226    67110226    uc001dct.3    16    +    3    10
chr1    67108226    67110226    uc001dct.3    16    +    4    10
chr1    67108226    67110226    uc001dct.3    16    +    5    8
chr1    67108226    67110226    uc001dct.3    16    +    6    8
chr1    67108226    67110226    uc001dct.3    16    +    7    8
chr1    67108226    67110226    uc001dct.3    16    +    8    8
chr1    67108226    67110226    uc001dct.3    16    +    9    8
chr1    67108226    67110226    uc001dct.3    16    +    10    8

Then in R, the genomic position, in column 7, is converted to relative position to the TSS and read counts normalized to the library size. This is converted to a numeric matrix with each row being a TSS and each column the relative nucleotide position. For the plotting the matrix is ordered number of reads per TSS, and the values logged. This is the outcome:

heatmap(cov.mlog, Rowv=NA, Colv=NA, scale="column", labCol = FALSE, labRow = FALSE, col=brewer.pal(9, "Greens"), margins = c(5, 5))

enter image description here

Although the average coverage plot looks as one would expect for pol2, the heatmap is, well, not great. My question is what am I doing wrong and how to improve it? In this and this paper, the coverage is calculate for a bin of nucleotides and not per nucleotide as I am doing. Would that improve visualization? How to do it in bedtools? Should the sorting of the matrix be done differently?

I am aware that could use ngsplot for this, but I am trying to avoid it because my implementation would fit better with my other analysis.

Thank you!

bedtools coverage heatmap • 8.9k views
ADD COMMENTlink modified 5.6 years ago by Ying W3.9k • written 5.7 years ago by A. Domingues2.0k
6
gravatar for Björn
5.6 years ago by
Björn650
Germany
Björn650 wrote:

Hi,

you may find deeptools useful.

Example plots can be found here: http://f1000.com/posters/browse/summary/1094053

Furthermore, deeptools can create bigwig files that are normalized or ratios. As a bonus a Galaxy wrapper is available in the ToolShed.

ADD COMMENTlink written 5.6 years ago by Björn650

I have bigwig files for input and IP - the input seems to have a lot more peaks and the IP almost looks flat. Can I actually compare these two files ?

Since the input has more reads 77578695 compared to 52746408 in the IP. I would expect the IP samples to have significantly enriched regions compared to the input, and as you can see the opposite is true in my samples.

enter image description here

https://s3.postimg.org/7ixq74p37/all_genomewide3.jpg

all_genomewide3.jpg

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by DataFanatic130
3
gravatar for Ming Tang
5.7 years ago by
Ming Tang2.4k
Houston/MD Anderson Cancer Center
Ming Tang2.4k wrote:

Hi,

you may use Homer http://biowhat.ucsd.edu/homer/ngs/quantification.html to get the coverage for each TSS, and cluster with cluster3.0 and visualize by Java Treeviewer http://bonsai.hgc.jp/~mdehoon/software/cluster/. Or you can use R to draw the heatmap after you get the matrix.

See if you get similar results.

ADD COMMENTlink written 5.7 years ago by Ming Tang2.4k

@tangming2005, as stated in the question, I already have the coverage for each TSS with nucleotide resolution, and the matrix. The plot I have created was also using heatmap in R. The question is about improving it by reducing the resolution for instance, and how to do that with bedtools and R.

ADD REPLYlink written 5.7 years ago by A. Domingues2.0k

Using Homer is not always a good choice for TSS annotation. It's very inflexible, you can't add your own updated Annotation file, its very difficult to customize or reverse engineer the author's method of doing annotations which are one-size-fits all. For example, he chooses the TSS as default defined from -1kb to +100bp. This is inappropriate for C. elegans (where promoter may only be a few hundred basepairs and median distance between worm genes is only 2.5kb), despite the fact that he provides ce10 annotation using this standard. I ended up having a lot of misannotation of peaks (bc these TSS regions are prioritized above everything else, even when maybe it would be better annotated as an intron of a different gene, etc.)

I have also not found the author to be responsive to emails. I ended up writing my own custom annotation script using Bedtools and Ruby and Wormbase gff files.

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by margiezilla90
1
gravatar for Ying W
5.6 years ago by
Ying W3.9k
South San Francisco, CA
Ying W3.9k wrote:

not sure what your bedtools output is supposed to show, seems to me like you are getting lots of different coverage at the same location/base pair?

For the heatmap, try using less TSS (start w/top 500) and also go a bit further out (+/- 1kb at least) and use a darker color too

ADD COMMENTlink written 5.6 years ago by Ying W3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 785 users visited in the last hour