Heatmap Of Read Coverage Around Tsss
3
8
Entering edit mode
9.0 years ago
A. Domingues ★ 2.6k

I am trying to plot a heatmap of read density around a feature of interest (TSSs) very common in genomics papers. something like this (B):

However, I am struggling a bit in getting to look "right". A bit of background: I have mapped ChIP-seq reads for pol2 and calculate the coverage, per nucleotide, using bedtools.

coverageBed -d -abam $bamFile -b$TSSs > \$coverage.bed
# output:
chr1    67108226    67110226    uc001dct.3    16    +    1    10
chr1    67108226    67110226    uc001dct.3    16    +    2    10
chr1    67108226    67110226    uc001dct.3    16    +    3    10
chr1    67108226    67110226    uc001dct.3    16    +    4    10
chr1    67108226    67110226    uc001dct.3    16    +    5    8
chr1    67108226    67110226    uc001dct.3    16    +    6    8
chr1    67108226    67110226    uc001dct.3    16    +    7    8
chr1    67108226    67110226    uc001dct.3    16    +    8    8
chr1    67108226    67110226    uc001dct.3    16    +    9    8
chr1    67108226    67110226    uc001dct.3    16    +    10    8


Then in R, the genomic position, in column 7, is converted to relative position to the TSS and read counts normalized to the library size. This is converted to a numeric matrix with each row being a TSS and each column the relative nucleotide position. For the plotting the matrix is ordered number of reads per TSS, and the values logged. This is the outcome:

heatmap(cov.mlog, Rowv=NA, Colv=NA, scale="column", labCol = FALSE, labRow = FALSE, col=brewer.pal(9, "Greens"), margins = c(5, 5))


Although the average coverage plot looks as one would expect for pol2, the heatmap is, well, not great. My question is what am I doing wrong and how to improve it? In this and this paper, the coverage is calculate for a bin of nucleotides and not per nucleotide as I am doing. Would that improve visualization? How to do it in bedtools? Should the sorting of the matrix be done differently?

I am aware that could use ngsplot for this, but I am trying to avoid it because my implementation would fit better with my other analysis.

Thank you!

heatmap coverage bedtools • 12k views
6
Entering edit mode
8.9 years ago
Björn ▴ 670

Hi,

you may find deeptools useful.

Example plots can be found here: http://f1000.com/posters/browse/summary/1094053

Furthermore, deeptools can create bigwig files that are normalized or ratios. As a bonus a Galaxy wrapper is available in the ToolShed.

0
Entering edit mode

I have bigwig files for input and IP - the input seems to have a lot more peaks and the IP almost looks flat. Can I actually compare these two files ?

Since the input has more reads 77578695 compared to 52746408 in the IP. I would expect the IP samples to have significantly enriched regions compared to the input, and as you can see the opposite is true in my samples.

https://s3.postimg.org/7ixq74p37/all_genomewide3.jpg

all_genomewide3.jpg

3
Entering edit mode
9.0 years ago
Ming Tang ★ 2.7k

Hi,

you may use Homer http://biowhat.ucsd.edu/homer/ngs/quantification.html to get the coverage for each TSS, and cluster with cluster3.0 and visualize by Java Treeviewer http://bonsai.hgc.jp/~mdehoon/software/cluster/. Or you can use R to draw the heatmap after you get the matrix.

See if you get similar results.

0
Entering edit mode

@tangming2005, as stated in the question, I already have the coverage for each TSS with nucleotide resolution, and the matrix. The plot I have created was also using heatmap in R. The question is about improving it by reducing the resolution for instance, and how to do that with bedtools and R.

0
Entering edit mode

Using Homer is not always a good choice for TSS annotation. It's very inflexible, you can't add your own updated Annotation file, its very difficult to customize or reverse engineer the author's method of doing annotations which are one-size-fits all. For example, he chooses the TSS as default defined from -1kb to +100bp. This is inappropriate for C. elegans (where promoter may only be a few hundred basepairs and median distance between worm genes is only 2.5kb), despite the fact that he provides ce10 annotation using this standard. I ended up having a lot of misannotation of peaks (bc these TSS regions are prioritized above everything else, even when maybe it would be better annotated as an intron of a different gene, etc.)

I have also not found the author to be responsive to emails. I ended up writing my own custom annotation script using Bedtools and Ruby and Wormbase gff files.

1
Entering edit mode
8.9 years ago
Ying W ★ 4.1k

not sure what your bedtools output is supposed to show, seems to me like you are getting lots of different coverage at the same location/base pair?

For the heatmap, try using less TSS (start w/top 500) and also go a bit further out (+/- 1kb at least) and use a darker color too