Question

what to do when differential analysis heatmap looks like this?

1

Entering edit mode

4.9 years ago

newbie ▴ 120

Hello,

I'm working with rnaseq data of breast cancer samples. There are a total of 40 samples. Among 40 samples, 26 samples are of subtype A and 14 are subtype B.

I did differential analysis with samples between Subtype A and Subtype B with edgeR. Differentially expressed genes are based on FDR < 0.05

The heat map looks like below:

Column annotation colors -

Orange color is Subtype A
Darkgreen color is Subtype B

I see that among 26 samples of subtype A, 9 samples are clustered but are away from other 17 samples. You can see that clearly in the heat map.

enter image description here

I also made a MDS plot. In the below MDS plot I made a circle where the 9 samples of Subtype A are close to the samples of Subtype B.

enter image description here

What I should do now if the differential analysis heatmap looks like above? Is removing those 9 samples from the analysis just based on clustering a good idea? If not any suggestions please.

thanq

RNA-Seq R heatmap differentialanalysis edger • 2.9k views

ADD COMMENT • link updated 4.9 years ago by eric.audemard ▴ 10 • written 4.9 years ago by newbie ▴ 120

1

Entering edit mode

There is maybe some batch effect .. Are these samples sequenced on the same sequencing run ? Were the RNA library prepared at the same time ? Same lib kit ? Same RNA extraction method ?

ADD REPLY • link 4.9 years ago by Nicolas Rosewick 10k

score 2 · Answer 1 · 2019-05-22

Cancer subtypes can be a difficult thing. Many cancer types can be broken down into quite distinct sub-subtypes. Its also possible that what clinicians have used to assign subtypes is not as clear cut as they would like. For example, in endometrial cancer samples can traditionally be typed on two different systems - Type I vs Type II, and endometrioid vs Serous. However we find that the Type I/II classification doesn't make much sense from a molecular point of view.

The question is, what are you trying to gain from this analysis. If you want to know "what are the average gene expression differences between samples with these two different classes", then use the DE and don't worry about the heatmap (I'm not really a fan of using heatmaps just because you need to have a figure of some sort).

On the otherhand, if you are interested in discovering the hetrogenity underlying cancer, of which the current subtyping schemes are but one example, then I would start with some sort of clustering, identify clusters, and then identify the genes driving the clusters (either from the gene dendrogram, or by doing DE from the de novo identified clusters). You might then find that one of the clusters corresponds to a traditional "sub-class".

score 0 · Answer 2 · 2019-05-24

Your heatmap is difficult to interpret, but it's seems to be fine. As mentioned by i.sudbery, you have certainly 2 sub-subtypes into the subtype A.

A good way to verify this point is to display the dendrogram on column and ask to split your column using 3 or 4 clusters. The order of the clusters is arbitrary (you can swap them), you mainly need to verify if each cluster are composed by the same subtype.

Below, an example build with R and pheatmap, with batch and subgroup track on column. We can see 4 subgroups with several sub-subgroup highlighted by 11 clusters.