How to work out Z score for heatmaps for RNA seq dataset
2
0
Entering edit mode
2.4 years ago
v.johnson ▴ 20

I am trying to generate some heatmaps for my RNA seq dataset but struggling to work out how to calculate the z score. Can anyone give me any pointers on how to calculate please. I have;

Gene_DE.txt file which includes; baseMean, logFC, lfcSE, stat, pvalue and FDR.

Also have featureCounts .txt which includes length and count data.

Any pointers would be amazing! Thank you

R • 9.8k views
11
Entering edit mode
2.4 years ago
ATpoint 77k

Usually people plot the normalzed counts on the log2 scale, for the differential genes, and these converted to Z scale:

You seem to use DESeq2, so one could directly use the output from vst transformation:

vsd <- assay(vst(dds))
Z <- t(scale(t(vsd)))


So what does that do? We apply scale which is the actual Z-scoring function. In R data.frames and matrices are basically lists of vectors (every column is a vector) but we want rowwise (per gene) Z-scores. scale by default (like any operation on matrices / dfs) operates column-wise though, so we first transpose the matrix, then scale it, and then transpose it back so we again have a column=sample and row=gene matrix (or data.frame). That's it.

Then subset Z to the genes you want to plot and make the heatmap.

Does that make sense to you?

0
Entering edit mode

Thank you for your kind reply. I am using DESeq2, I will have a go with this! Thank you!

0
Entering edit mode

thank you ATpoint! I followed your post and I did get the scaled Z matrix. However, I failed to generate a heatmap from the Z matrix with code pheatmap(Z, clustering_distance_rows = "correlation", clustering_distance_cols = "manhattan"). The error message is that "Error in hclust(d, method = method) : NA/NaN/Inf in foreign function call (arg 10)". It seems in the Z matrix there are many rows with NA/NaN/Inf. I wonder if you have good suggestions to remove the NA/NaN/Inf from the rows of Z matrix? Is it better to remove these rows first from dds and then do the vst transformation and Z sacling?

1
Entering edit mode

You probably have NaNs in there because some rows had the same value so Z goes NaN. Remove them, e.g. with complete.cases()

0
Entering edit mode

Thank you very much ATpoint! I followed your suggestions and use complete.cases() to remove NaN from the Z matrix, and now the pheatmap works! It seems to me that the heatmap generated from the Z matrix contains all the genes, is there a way to generate a heatmap so that only part of the genes (say top 100 genes ranked with the lowest P-value)? I guess I need to first generate a Z matrix with only the preferable genes.

0
Entering edit mode

Thank you for posting. I was searching for an answer on whether it is appropriate to use transformed counts to calculate Z-Score. I assume that rlog and vst transformed counts can both be used as input for the Z-Score calculation?

1
Entering edit mode

Yes, they're good and commonly-used choices as a) they're properly normalized and b) on log2 scale. By the way, my answer uses vst actually if you look closely.

1
Entering edit mode
2.4 years ago

You can use 'scale' option of pheatmap package in R to use Z-score in heatmap.