Question: what to do when differential analysis heatmap looks like this?
gravatar for newbie
25 days ago by
newbie50 wrote:


I'm working with rnaseq data of breast cancer samples. There are a total of 40 samples. Among 40 samples, 26 samples are of subtype A and 14 are subtype B.

I did differential analysis with samples between Subtype A and Subtype B with edgeR. Differentially expressed genes are based on FDR < 0.05

The heat map looks like below:

Column annotation colors -

Orange color is Subtype A
Darkgreen color is Subtype B

I see that among 26 samples of subtype A, 9 samples are clustered but are away from other 17 samples. You can see that clearly in the heat map.

enter image description here

I also made a MDS plot. In the below MDS plot I made a circle where the 9 samples of Subtype A are close to the samples of Subtype B.

enter image description here

What I should do now if the differential analysis heatmap looks like above? Is removing those 9 samples from the analysis just based on clustering a good idea? If not any suggestions please.


ADD COMMENTlink modified 23 days ago by eric.audemard0 • written 25 days ago by newbie50

There is maybe some batch effect .. Are these samples sequenced on the same sequencing run ? Were the RNA library prepared at the same time ? Same lib kit ? Same RNA extraction method ?

ADD REPLYlink written 25 days ago by Nicolas Rosewick7.7k
gravatar for i.sudbery
25 days ago by
Sheffield, UK
i.sudbery4.7k wrote:

Cancer subtypes can be a difficult thing. Many cancer types can be broken down into quite distinct sub-subtypes. Its also possible that what clinicians have used to assign subtypes is not as clear cut as they would like. For example, in endometrial cancer samples can traditionally be typed on two different systems - Type I vs Type II, and endometrioid vs Serous. However we find that the Type I/II classification doesn't make much sense from a molecular point of view.

The question is, what are you trying to gain from this analysis. If you want to know "what are the average gene expression differences between samples with these two different classes", then use the DE and don't worry about the heatmap (I'm not really a fan of using heatmaps just because you need to have a figure of some sort).

On the otherhand, if you are interested in discovering the hetrogenity underlying cancer, of which the current subtyping schemes are but one example, then I would start with some sort of clustering, identify clusters, and then identify the genes driving the clusters (either from the gene dendrogram, or by doing DE from the de novo identified clusters). You might then find that one of the clusters corresponds to a traditional "sub-class".

ADD COMMENTlink written 25 days ago by i.sudbery4.7k

Thanks for the answer. Have a similar doubt. What answer I can give if the DEA heatmap looks like above, where 9 samples of subtype A are away from the main cluster?

ADD REPLYlink written 25 days ago by newbie50

There is nothing wrong with the DE list - it is still the genes that are on average different between subtype A and subtype B, its just that subtype A and subtype B might not be the most useful thing.

You could check that the subtype annotation is correct. For example, if this were receptor + vs tripple negative breast cancer, you could look at the expression of the hormone receptors in each of the samples to ensure that receptor + samples havn't been annotated as triple negative, or vice-versa.

ADD REPLYlink written 25 days ago by i.sudbery4.7k
gravatar for eric.audemard
23 days ago by
eric.audemard0 wrote:

Your heatmap is difficult to interpret, but it's seems to be fine. As mentioned by i.sudbery, you have certainly 2 sub-subtypes into the subtype A.

A good way to verify this point is to display the dendrogram on column and ask to split your column using 3 or 4 clusters. The order of the clusters is arbitrary (you can swap them), you mainly need to verify if each cluster are composed by the same subtype.

Below, an example build with R and pheatmap, with batch and subgroup track on column. We can see 4 subgroups with several sub-subgroup highlighted by 11 clusters. An example build with R and pheatmap, with batch and subgroup track on column.

ADD COMMENTlink modified 23 days ago • written 23 days ago by eric.audemard0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1384 users visited in the last hour