Entering edit mode
14 months ago
Laurie
•
0
Dear Biostars,
I'm working on RNAseq data. I have rlog transformed counts (from the rlog transformation in DESeq2 package). I want to use them as an input for creating heatmaps and subsequent clustering to identify potential functional groups of transcripts.
I'm wondering if these rlog transformed counts should be scaled before attempting any clustering, either hierachical or k-means. Or should they be used as such? I can't really decide... So any insight would be hugely appreciated!
Thanks a lot!
Rlog transformation is similar to a log2 transfo. for genes with hight counts, while shrinking together the values for different samples for genes with low counts. Variances are already roughly homoskedastic after the rlog transformation.
So if scaling means transforming data to the Z-scale [ "deviation from the mean of all samples for that gene"], and variances are already approx. the same, isn't it redundant?
I read this post before, and I'm surely missing something, this is why I'm asking here...
rlog and vst (and standard log2) still preserve expression level differences. So you have genes with large snd genes with low counts. Z-scale measures deviation from the gene mean so the magnitude of counts is eliminated. It's two different concepts.
OK, this is where I was completely lost.
Thanks for your help and answer.