RNA-Seq Data Quality Assesment- Heatmap and PCA Interpretation
1
0
Entering edit mode
3.7 years ago
Aynur ▴ 60

I am following STAR-HTSeq -DESeq2 pipeline for my mouse RNA-Seq data analysis. I am concerned about heatmap and PCA results. I am concerned about sample b and I was expecting it should not cluster with the control group. Am I missing something here? a, b,c,d are different treatment conditions and each one has two biological replicates. Here is my heatmap. Heatmap for samples PCA plot for samples

Should I be concerned about sample b ? How to interpret these plots? Any advice or article recommendation is appreciated. Thank you very much.

sequence rna-seq R next-gen • 4.7k views
ADD COMMENT
3
Entering edit mode
3.7 years ago

Hi,

In my opinion what that means is that among your treatments, treatment b is the most similar to the control condition, and it is virtually the same or quite similar to the control. Assuming that you have normalized your data before doing these analyses, using a vst or rlog normalization, what this means is that the gene expression profile between the treatment b and the control is virtually the same. So, your treatment b has not effect over the whole gene expression profile.

It may have, but it is so low that is difficult to quantify in relation to the control (perhaps with an higher no. of replicates), or the difference between these, treatment b vs. control, is only in a small no. of genes, and so these techniques are not detecting these differences.

Regarding PCA you might want to read this post. Essentially, tries to capture the variability in your gene expression profile. It only plots the first two most important PCs (Principal Components) that explain most of the variability, in your case 90%. If two samples/points are close, that means they have a similar gene expression profile. You need to be careful and read the plot figure through the x-axis or y-axis, since they explain different sources of variability in your data.

The heatmap, it depends on the distance you use. it seems that you've used a correlation metric, i.e., Pearson or Spearman, so it represents the correlation of gene expression profiles. If two samples are closer it means they are more correlated than the others. You can look to the index color to see if the correlation is high or low. If it is close to one, it means that are virtually the same. Though with high no. of genes being compared it is not difficult to get high correlation scores by change.

I hope this helps,

António

ADD COMMENT
0
Entering edit mode

Thank you for taking the time to give this thorough answer. I appreciate it.

ADD REPLY
0
Entering edit mode

Please accept the answer if it answers your questions and solves it.

ADD REPLY

Login before adding your answer.

Traffic: 2053 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6