How to work out Z score for heatmaps for RNA seq dataset
2
0
Entering edit mode
2.9 years ago
v.johnson ▴ 20

I am trying to generate some heatmaps for my RNA seq dataset but struggling to work out how to calculate the z score. Can anyone give me any pointers on how to calculate please. I have;

Gene_DE.txt file which includes; baseMean, logFC, lfcSE, stat, pvalue and FDR.

Also have featureCounts .txt which includes length and count data.

Any pointers would be amazing! Thank you

R • 13k views
ADD COMMENT
12
Entering edit mode
2.9 years ago
ATpoint 82k

Usually people plot the normalzed counts on the log2 scale, for the differential genes, and these converted to Z scale:

You seem to use DESeq2, so one could directly use the output from vst transformation:

vsd <- assay(vst(dds))
Z <- t(scale(t(vsd)))

So what does that do? We apply scale which is the actual Z-scoring function. In R data.frames and matrices are basically lists of vectors (every column is a vector) but we want rowwise (per gene) Z-scores. scale by default (like any operation on matrices / dfs) operates column-wise though, so we first transpose the matrix, then scale it, and then transpose it back so we again have a column=sample and row=gene matrix (or data.frame). That's it.

Then subset Z to the genes you want to plot and make the heatmap.

Does that make sense to you?

ADD COMMENT
0
Entering edit mode

Thank you for your kind reply. I am using DESeq2, I will have a go with this! Thank you!

ADD REPLY
0
Entering edit mode

thank you ATpoint! I followed your post and I did get the scaled Z matrix. However, I failed to generate a heatmap from the Z matrix with code pheatmap(Z, clustering_distance_rows = "correlation", clustering_distance_cols = "manhattan"). The error message is that "Error in hclust(d, method = method) : NA/NaN/Inf in foreign function call (arg 10)". It seems in the Z matrix there are many rows with NA/NaN/Inf. I wonder if you have good suggestions to remove the NA/NaN/Inf from the rows of Z matrix? Is it better to remove these rows first from dds and then do the vst transformation and Z sacling?

ADD REPLY
1
Entering edit mode

You probably have NaNs in there because some rows had the same value so Z goes NaN. Remove them, e.g. with complete.cases()

ADD REPLY
0
Entering edit mode

Thank you very much ATpoint! I followed your suggestions and use complete.cases() to remove NaN from the Z matrix, and now the pheatmap works! It seems to me that the heatmap generated from the Z matrix contains all the genes, is there a way to generate a heatmap so that only part of the genes (say top 100 genes ranked with the lowest P-value)? I guess I need to first generate a Z matrix with only the preferable genes.

ADD REPLY
0
Entering edit mode

Thank you for posting. I was searching for an answer on whether it is appropriate to use transformed counts to calculate Z-Score. I assume that rlog and vst transformed counts can both be used as input for the Z-Score calculation?

ADD REPLY
1
Entering edit mode

Yes, they're good and commonly-used choices as a) they're properly normalized and b) on log2 scale. By the way, my answer uses vst actually if you look closely.

ADD REPLY
1
Entering edit mode
2.9 years ago

You can use 'scale' option of pheatmap package in R to use Z-score in heatmap.

ADD COMMENT

Login before adding your answer.

Traffic: 3146 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6