Deeptools plotHeatmap: too much black! missing data assigned incorrectly
1
4
Entering edit mode
5.8 years ago
David K ▴ 40

Hello, I'm using deeptools computeMatrix/plotHeatmap, and am getting a large amount of missing data (i.e. black in the graph).

Commands used

The call to computeMatrix takes in ChIPseq peaks, and corresponding signal in bigWig format:

Command 1: computeMatrix

computeMatrix reference-point --referencePoint center -S file1.bw file2.bw file3.bw file4.bw -R peaks.bed -a 500 -b 500 -o matrix.gz

Note: actual filenames, -p option are omitted

Note: --binSize is omitted or 1

This is followed by:

Command 2: plotHeatmap

plotHeatmap -m $infile -o $ofile --outFileSortedRegions ${ofile/.$outformat/.bed} --sortUsing mean --outFileNameMatrix ${ofile/.$outformat/.mat} --zMin 0

Where: infile=matrix.gz (output from first command); ofile=matrix.$outformat; outformat typically equals "png"

Output

I have verified that the data are indeed not missing, in at least the first few examples. See the highlighted area of the attached graph (red box).

plotHeatmap screenshot

Here is a chunk of the corresponding data stored via the --outFileNameMatrix option:

1.256 1.256 1.256 1.256 1.256 1.256 1.256 1.256 1.256 1.256 1.346 1.346 1.346

I've validated these numbers by looking specifically at this region of the input file (equivalent to file1.bw in Command 1)

#converted to wiggle
track type=wiggle_0
fixedStep chrom=chrII start=2378124 span=1 step=1
1.25641036
1.25641036
1.25641036
1.25641036
1.25641036
1.25641036
1.25641036
1.25641036
1.25641036
1.25641036
1.34615386
1.34615386
1.34615386

Note: this matches the above matrix output when --binSize is set to 1 for Command 1: computeMatrix; the default setting (10) gives the expected results.

This value range should be printed as yellow according to the figure legend (not in screenshot), and represents the middle range of the data being plotted.

Also, adding the computeMatrix parameter --missingDataAsZero, you would expect all of that black to become red, right? That is not reflected in this plot:

plotHeatmap with black missing data versus zeroes

Question

Does anyone know what might be going on here? Am I running the commands incorrectly (newb)?

deeptools plotheatmap missing data computeMatrix • 6.9k views
ADD COMMENT
0
Entering edit mode

Tagging: Devon Ryan

ADD REPLY
0
Entering edit mode

Annoyingly, I don't actually get notified by these :(

ADD REPLY
3
Entering edit mode
5.8 years ago

Every time this issue is brought up it turns out to be due to the interpolation done in order to make a large matrix fit into a much smaller number of pixels. Try doing one of the following:

  1. Make the image larger so there's less interpolation.
  2. Increasing the DPI
  3. Changing the --interpolationMethod option

I have yet to see an example where that doesn't produce the results you're expecting. At the end of the day this comes down to how best to compress a lot of information into a small image. At some point you have to throw information away since there aren't enough pixels to represent everything, so no particular method will always produce the most desirable results.

ADD COMMENT
0
Entering edit mode

Thanks! Changing interpolation to "bicubic" made it worse, but "nearest" restored the data. I used "--sortRegions keep" in order to compare missingValues as zeroes versus nans. How does sorting take into account the missing data?

Thanks for the solution!

ADD REPLY
0
Entering edit mode

Missing data is ignored during all sorting steps, since otherwise you end up with a block of nicely sorted regions at the top and another block of unsorted regions containing missing data at the bottom. The heatmap and the output file will have the same sorting order in all cases.

ADD REPLY

Login before adding your answer.

Traffic: 3173 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6