Question: same experiment, different values with my heat map. Any help?
gravatar for Mozart
12 months ago by
Mozart190 wrote:

Hi there, I am really wrapping my head around a thing that I may have forgotten. Essentially, I have different results (i.e. rld ones) that I will use in my heat map that changes according to the number of samples I consider. I am wondering why this is happening. Given the fact I am sure I haven't explained myself clearly, I will try to paraphrase what I have just said:

I want to generate 2 heat maps: one, from the main comparison I am interested (6 samples) second one, containing results from all samples in my dataset (6 samples as before + 2)

by doing this, in both conditions:

dds <- DESeqDataSetFromTximport(txi.kallisto.tsv, table, ~condition)
dds <- DESeq(dds)
rld <- rlog(dds, blind=FALSE)
top_genes <- head(order(rowVars(assay(rld)), decreasing = TRUE), 100)
mat  <- assay(rld)[ top_genes, ]

I obtain different counts for the same genes in the 2 aforementioned conditions. Is this due to the fact that regularised logarithmic transformation is different according to the number of samples in the dataset?


heatmap rna-seq • 399 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by Mozart190
gravatar for ATpoint
12 months ago by
ATpoint34k wrote:

This is normal and expected given that normalization factors and model fitting will be different if you add or subtract samples. If you want to be independent of that, maybe use something like log2(FPKM+1). For visualization alone this is probably accurate enough. What do you want to show with the heatmaps?

ADD COMMENTlink written 12 months ago by ATpoint34k

Thanks for the quick reply. I've always had this feeling! I just want to show the top variable genes in my dataset...that's it.

Another question: should I stick with the same kind of log transformation (either vst or rlog) for all of the plots in my experiment or can I change the normalisation method each time (e.g. rlog for PCA and vst for heat map?)..thanks!

ADD REPLYlink written 12 months ago by Mozart190

I would not switch around as there should be consistency. Use what you prefer (or vst if you have many samples and rlog is too slow) but do not mix at will as they behave quite differently especially for variable genes with low counts.

Alternatively, what I personally find more meaningful is to show only those genes that are significantly different as high variability often comes from the mean-variance dependency for low-count genes. You could show the z-scored log2FCs for those with padj < 0.05. Still, if you prefer counts do not mix methods and be consistent.

ADD REPLYlink written 12 months ago by ATpoint34k

Thank you so much it was a great help. was facing the same

ADD REPLYlink written 12 months ago by stephannie.baker810
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1059 users visited in the last hour