Question: RNA-Seq Data Quality Assesment- Heatmap and PCA Interpretation
gravatar for Aynur
9 weeks ago by
Aynur40 wrote:

I am following STAR-HTSeq -DESeq2 pipeline for my mouse RNA-Seq data analysis. I am concerned about heatmap and PCA results. I am concerned about sample b and I was expecting it should not cluster with the control group. Am I missing something here? a, b,c,d are different treatment conditions and each one has two biological replicates. Here is my heatmap. Heatmap for samples PCA plot for samples

Should I be concerned about sample b ? How to interpret these plots? Any advice or article recommendation is appreciated. Thank you very much.

rna-seq next-gen R sequence • 236 views
ADD COMMENTlink modified 9 weeks ago by antonioggsousa1.5k • written 9 weeks ago by Aynur40
gravatar for antonioggsousa
9 weeks ago by
antonioggsousa1.5k wrote:


In my opinion what that means is that among your treatments, treatment b is the most similar to the control condition, and it is virtually the same or quite similar to the control. Assuming that you have normalized your data before doing these analyses, using a vst or rlog normalization, what this means is that the gene expression profile between the treatment b and the control is virtually the same. So, your treatment b has not effect over the whole gene expression profile.

It may have, but it is so low that is difficult to quantify in relation to the control (perhaps with an higher no. of replicates), or the difference between these, treatment b vs. control, is only in a small no. of genes, and so these techniques are not detecting these differences.

Regarding PCA you might want to read this post. Essentially, tries to capture the variability in your gene expression profile. It only plots the first two most important PCs (Principal Components) that explain most of the variability, in your case 90%. If two samples/points are close, that means they have a similar gene expression profile. You need to be careful and read the plot figure through the x-axis or y-axis, since they explain different sources of variability in your data.

The heatmap, it depends on the distance you use. it seems that you've used a correlation metric, i.e., Pearson or Spearman, so it represents the correlation of gene expression profiles. If two samples are closer it means they are more correlated than the others. You can look to the index color to see if the correlation is high or low. If it is close to one, it means that are virtually the same. Though with high no. of genes being compared it is not difficult to get high correlation scores by change.

I hope this helps,


ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by antonioggsousa1.5k

Thank you for taking the time to give this thorough answer. I appreciate it.

ADD REPLYlink written 9 weeks ago by Aynur40

Please accept the answer if it answers your questions and solves it.

ADD REPLYlink written 9 weeks ago by RamRS30k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1765 users visited in the last hour