I am trying to generate some heatmaps for my RNA seq dataset but struggling to work out how to calculate the z score. Can anyone give me any pointers on how to calculate please. I have;
Gene_DE.txt file which includes; baseMean, logFC, lfcSE, stat, pvalue and FDR.
Also have featureCounts .txt which includes length and count data.
Any pointers would be amazing! Thank you
Thank you for your kind reply. I am using DESeq2, I will have a go with this! Thank you!
thank you ATpoint! I followed your post and I did get the scaled Z matrix. However, I failed to generate a heatmap from the Z matrix with code
pheatmap(Z, clustering_distance_rows = "correlation", clustering_distance_cols = "manhattan")
. The error message is that "Error in hclust(d, method = method) : NA/NaN/Inf in foreign function call (arg 10)". It seems in the Z matrix there are many rows with NA/NaN/Inf. I wonder if you have good suggestions to remove the NA/NaN/Inf from the rows of Z matrix? Is it better to remove these rows first from dds and then do the vst transformation and Z sacling?You probably have NaNs in there because some rows had the same value so Z goes NaN. Remove them, e.g. with
complete.cases()
Thank you very much ATpoint! I followed your suggestions and use
complete.cases()
to remove NaN from the Z matrix, and now the pheatmap works! It seems to me that the heatmap generated from the Z matrix contains all the genes, is there a way to generate a heatmap so that only part of the genes (say top 100 genes ranked with the lowest P-value)? I guess I need to first generate a Z matrix with only the preferable genes.Thank you for posting. I was searching for an answer on whether it is appropriate to use transformed counts to calculate Z-Score. I assume that rlog and vst transformed counts can both be used as input for the Z-Score calculation?
Yes, they're good and commonly-used choices as a) they're properly normalized and b) on log2 scale. By the way, my answer uses vst actually if you look closely.