I am doing differential expression analysis. I am comparing two different experiments, each experiment consisting of two treatments and their respective controls in duplicate.
I used DESeq2 to generate a distinct results object for each of the 4 control/treatment pairs and am doing downstream analysis on genes with the adjusted p-value below 0.01.
My question regards the difference between considering genes differentially expressed based on the p-value, which is continuous and comparing the result with a heatmap. Again, p-value thresholds are taken from the DESeq2 results object generated for each of the conditions.
I will illustrate this with two images. These images take into consideration only two of the 4 conditions.
The venn diagram looks like this:
So in each condition a certain number of genes were differentially expressed and the overlaps between the two conditions are shown. In this example, in condition A there are 275 genes that are only differentially expressed in that condition.
However, when I create a heatmap of those genes, which should be exclusively differentially expressed in condition A, I observe that there is also an obvious difference in condition B, even if less strong. Note that the columns in the heatmap are ordered:
CTR CTR CTR CTR TREAT TREAT TREAT TREAT A A B B A A B B
The heatmap tells a different story than the venn diagram. While simply using the p threshold I can define genes as being uniquely differentially expressed in one condition only, the heatmap makes conditions A and B look much more similar, as also shown by the clustering.
Any tips or insight would be greatly appreciated.