I've been using deeptools to make heatmaps of ChIP-seq samples and hierarchical clustering of those samples. I have two different transcription factors that I'm interested in and I'm interested in determining if there are regions where each transcription factor binds individually, where there is joint binding, and where in the presence of a mutant transcription factor to we see more binding for the other TF (for example if TF "A" is mutant, in the ChIP-seq of B do we unique regions of binding that we don't see in when the TF A is WT).
So far I made a union files of all peaks by first combining the bam files for all conditions of each transcription factor, then called peaks with MAC2 for each TF, I then used bedtools to take a union of peaks and remove duplicates to come up with a list of about 8,000 sites, in which I used bedtools slop to make sure that the peak regions were all 1kb long, to use as my reference point list for computeMatrix.
computeMatrix reference-point --referencePoint center -R merge_TF1_TF2_slop.bed -S X_mut1_TF1.bigwig X_mut1_TF2.bigwig X_mut2_TF1.bigwig X_mut2_TF2.bigwig X_WT_TF1.bigwig X_WT_TF2.bigwig --skipZero -o matrix_TF1_TF2_unionPeaks.gz
I used "center" as my only other choices were TSS and TES and I wasn't really sure which was the best option. I then plotted the matrix using plotHeatmap with the goal of using k-means or hclust to force it to separate into unique clusters of samples (for example, peaks in sample 1 and 3, but not 2, 4, 5, and 6). The code used for plotHeatmap was
plotHeatmap -m matrix_TF1_TF2_unionPeaks.gz -o matrix_TF1_TF2_unionPeaks.png --outFileSortedRegions matrix_TF1_TF2_cluster.bed --colorMap RdBu --whatToShow 'heatmap and colorbar' --zMin -2 -2 0 --zMax 2 2 3 --hclust 7
The resultant graph didn't really show any clear clustering of the samples (in the least I anticipated that the TFs would cluster together), but also I didn't really understand the colorbar at the bottom.
As you can see in the heatmap, even though there are clusters, It doesn't seem that the clusters are distinct.
Does anyone have any suggestions on what exactly I am doing wrong based on my method above. It could very well be that this is just the data and there are no clear clusters, but it seems perplexing based on the motif analysis that I did on these samples suggesting that there are clear overlaps amongst the TF1 and TF2 samples.
Thank you for your time,
tagging: Devon Ryan Devon Ryan