Question: same experiment, different values with my heat map. Any help?
gravatar for Mozart
18 months ago by
Mozart240 wrote:

Hi there, I am really wrapping my head around a thing that I may have forgotten. Essentially, I have different results (i.e. rld ones) that I will use in my heat map that changes according to the number of samples I consider. I am wondering why this is happening. Given the fact I am sure I haven't explained myself clearly, I will try to paraphrase what I have just said:

I want to generate 2 heat maps: one, from the main comparison I am interested (6 samples) second one, containing results from all samples in my dataset (6 samples as before + 2)

by doing this, I obtain different counts for the same genes in the 2 aforementioned conditions. Is this due to the fact that regularised logarithmic transformation is different according to the number of samples in the dataset?


heatmap rna-seq • 506 views
ADD COMMENTlink modified 5 months ago • written 18 months ago by Mozart240
gravatar for ATpoint
18 months ago by
ATpoint42k wrote:

This is normal and expected given that normalization factors and model fitting will be different if you add or subtract samples. If you want to be independent of that, maybe use something like log2(FPKM+1). For visualization alone this is probably accurate enough. What do you want to show with the heatmaps?

ADD COMMENTlink written 18 months ago by ATpoint42k

Thanks for the quick reply. I've always had this feeling! I just want to show the top variable genes in my dataset...that's it.

Another question: should I stick with the same kind of log transformation (either vst or rlog) for all of the plots in my experiment or can I change the normalisation method each time (e.g. rlog for PCA and vst for heat map?)..thanks!

ADD REPLYlink written 18 months ago by Mozart240

I would not switch around as there should be consistency. Use what you prefer (or vst if you have many samples and rlog is too slow) but do not mix at will as they behave quite differently especially for variable genes with low counts.

Alternatively, what I personally find more meaningful is to show only those genes that are significantly different as high variability often comes from the mean-variance dependency for low-count genes. You could show the z-scored log2FCs for those with padj < 0.05. Still, if you prefer counts do not mix methods and be consistent.

ADD REPLYlink written 18 months ago by ATpoint42k

Thank you so much it was a great help. was facing the same

ADD REPLYlink written 18 months ago by stephannie.baker810
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1339 users visited in the last hour