Question

Heatmap of rnaseq data with z-score scale

0

Entering edit mode

5 months ago

giovanna • 0

I have a doubt/question regarding the heatmap visualization of gene expression data obtained with bulk RNA-seq technology from different datasets, with z-score row scaling. By using the same list of genes, when the heatmap generated by using only samples from the same datasets heatmap highlights difference in the gene expression between patients vs controls (Figure1) but when the matrix include also samples from different datasets differences between patients and controls seem to disappear, while it seems to be opposite expression trends between samples from different datasets (Figure2). can you give me some suggestions on how to solve this problem?

RNA-seq • 1.1k views

ADD COMMENT • link updated 5 months ago by swbarnes2 15k • written 5 months ago by giovanna • 0

0

Entering edit mode

ADD REPLY • link 5 months ago by giovanna • 0

score 2 · Answer 1 · 2025-05-21

The z-score is the raw score minus the mean of all samples, divided by the standard deviation so if you add samples, then you change the z-score. You should make a choice of the samples you want to compare because z-scores will vary depending on what you want to show and cannot be compared from a heatmap to another heatmap. Adding samples from a different RNA-seq dataset may not be relevant because there are inherent batch/biological differences, but it all depends on the nature of your datasets.

score 1 · Answer 2 · 2025-05-21

1

Entering edit mode

5 months ago

swbarnes2 15k

There isn't really a problem. Your data is what it is, but I think adding data from a totally different data set is a bad idea. RNASeq is too sensitive to batch effects, those batch effects are diminishing the comparisons you want to highlight. Just don't combine your data like that.

ADD COMMENT • link 5 months ago by swbarnes2 15k