Question

How to work out Z score for heatmaps for RNA seq dataset

0

Entering edit mode

2.9 years ago

v.johnson ▴ 20

I am trying to generate some heatmaps for my RNA seq dataset but struggling to work out how to calculate the z score. Can anyone give me any pointers on how to calculate please. I have;

Gene_DE.txt file which includes; baseMean, logFC, lfcSE, stat, pvalue and FDR.

Also have featureCounts .txt which includes length and count data.

Any pointers would be amazing! Thank you

R • 13k views

ADD COMMENT • link updated 11 months ago by yudibedi • 0 • written 2.9 years ago by v.johnson ▴ 20

1

Entering edit mode

2.9 years ago

Rashedul Islam ▴ 470

You can use 'scale' option of pheatmap package in R to use Z-score in heatmap.

ADD COMMENT • link 2.9 years ago by Rashedul Islam ▴ 470

score 12 · Accepted Answer · 2021-05-21

12

Entering edit mode

2.9 years ago

ATpoint 82k

Usually people plot the normalzed counts on the log2 scale, for the differential genes, and these converted to Z scale:

You seem to use DESeq2, so one could directly use the output from vst transformation:

vsd <- assay(vst(dds))
Z <- t(scale(t(vsd)))

So what does that do? We apply scale which is the actual Z-scoring function. In R data.frames and matrices are basically lists of vectors (every column is a vector) but we want rowwise (per gene) Z-scores. scale by default (like any operation on matrices / dfs) operates column-wise though, so we first transpose the matrix, then scale it, and then transpose it back so we again have a column=sample and row=gene matrix (or data.frame). That's it.

Then subset Z to the genes you want to plot and make the heatmap.

Does that make sense to you?

ADD COMMENT • link 2.9 years ago by ATpoint 82k

0

Entering edit mode

Thank you for your kind reply. I am using DESeq2, I will have a go with this! Thank you!

ADD REPLY • link 2.9 years ago by v.johnson ▴ 20

0

Entering edit mode

thank you ATpoint! I followed your post and I did get the scaled Z matrix. However, I failed to generate a heatmap from the Z matrix with code pheatmap(Z, clustering_distance_rows = "correlation", clustering_distance_cols = "manhattan"). The error message is that "Error in hclust(d, method = method) : NA/NaN/Inf in foreign function call (arg 10)". It seems in the Z matrix there are many rows with NA/NaN/Inf. I wonder if you have good suggestions to remove the NA/NaN/Inf from the rows of Z matrix? Is it better to remove these rows first from dds and then do the vst transformation and Z sacling?

ADD REPLY • link 2.5 years ago by xiaoleiusc ▴ 140

1

Entering edit mode

You probably have NaNs in there because some rows had the same value so Z goes NaN. Remove them, e.g. with complete.cases()

ADD REPLY • link 2.5 years ago by ATpoint 82k

0

Entering edit mode

Thank you very much ATpoint! I followed your suggestions and use complete.cases() to remove NaN from the Z matrix, and now the pheatmap works! It seems to me that the heatmap generated from the Z matrix contains all the genes, is there a way to generate a heatmap so that only part of the genes (say top 100 genes ranked with the lowest P-value)? I guess I need to first generate a Z matrix with only the preferable genes.

ADD REPLY • link 2.5 years ago by xiaoleiusc ▴ 140

0

Entering edit mode

Thank you for posting. I was searching for an answer on whether it is appropriate to use transformed counts to calculate Z-Score. I assume that rlog and vst transformed counts can both be used as input for the Z-Score calculation?

ADD REPLY • link 15 months ago by vanbelj ▴ 40

1

Entering edit mode

Yes, they're good and commonly-used choices as a) they're properly normalized and b) on log2 scale. By the way, my answer uses vst actually if you look closely.

ADD REPLY • link 15 months ago by ATpoint 82k