Question

Problem in heatmap visualisation

0

Entering edit mode

7.9 years ago

Ying ▴ 10

Hi guys,

I wanted to visualise two experiments samples.
e.g.
Experiment(1) : 200 samples
Experiment(2) : 15 samples.

So I visualised both experiments together.

Problem is:
If I will use 50 samples of the first experiment it will give me the Heatmap_1
If I will use 200 samples of the first experiment it will give me the Heatmap_2
So why such expression level variation changes by using the same experimental sample?

Heatmap_1: enter image description here

Heatmap_2: enter image description here
What is mean is that, I will lose variation on the heatmap if I will use more samples. any reason or explanation one can give?

Thank you

RNA-Seq microarray heatmap • 1.8k views

ADD COMMENT • link updated 6.3 years ago by Biostar 20 • written 7.9 years ago by Ying ▴ 10

0

Entering edit mode

Sorry the question is not very clear, what are you visualizing here in the heatmap? are they expression values across different samples categorized in conditions? If so then yes the more samples you use the better is the distribution of the genes across various sample between your conditions and you can really understand the dys-regulation. I am guessing you are trying to visualize the differentially expressed genes across your samples. That is the reason you see difference in signals in red and green indicating the up and down regulation of your samples. Why do you want to restrict from 200 to 50, any specific reason or outlier identification that you are doing such?

the more samples you put the more robust is the distribution of you genes across different samples giving you an understanding how variable is it across 2 biological conditions among all samples.

Having said that am expecting you are visualizing this on normalized expression values of microarray data right?

ADD REPLY • link 7.9 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

Thanks. Yes I am using microarray data and log2 normalised. I have two conditions, ConditionA is Experiment(1) and ConditionB is Experiment(2). So if I will use 50 samples from conditionA then I will see Heatmap_1 and I can see some differences as you can notice the right-side of the heatmap. But if I use 200 samples then I barely see a Green and Red differences, right? So I need to understand this. Isn't it better if I will use the same number of samples in both condition to avoid losing such variation?

ADD REPLY • link 7.9 years ago by Ying ▴ 10

0

Entering edit mode

Ah so you are trying to compare experiment 1 (200) samples vs experiment 2 (15 samples), so the check is between CondA (expt1) vs ConB(expt2). In this case it will likely to skew the variability. Owing to the large number of samples in one condition. But in any case perform a differential expression between the 2 conditions rather than looking at the entire expression data. See if you have differentially expressed genes between 2 conditions, if so then try to extract these genes across all samples in both conditions and then plot them with heatmap, you will still be able to see some sort of variability between two conditions but only thing is to what extent overpowering sample number masks the effect.

ADD REPLY • link 7.9 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

Yes that is what I am trying to do. But I also solved my thing by increase the width of the heatmap image though now I can see the variation of expression from the small samples(15) experiment. I already extracted few genes but not based on differentially expressed but based on other feature I chose by.

ADD REPLY • link 7.9 years ago by Ying ▴ 10

1

Entering edit mode

As long as it serves the experimental hypothesis that you are banking on , it is fine. In any case even increasing the height of the cells will not give you stark difference since in any case comparing 200 with 15 just by expression value might be not so good an idea since condition override will take place. But if the genes you extracted are still giving you a clear distinction of the two conditions then that is well and good. Cheers!

ADD REPLY • link 7.9 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

Thanks. Can you give your input in this too? how about if I will get mean of each 10 samples of the 200 samples then I will make up 20 samples at the end to compare or do heatmap with the other 15 samples? what do you say? and thanks again.

ADD REPLY • link 7.9 years ago by Ying ▴ 10

0

Entering edit mode

No I do not think it is a good idea. Are they replicates or individual samples that attributes to 200? If they are independent then it is not a good idea as you will be biased one the way you subset 200 samples to 10. It is better to use a package like limma() for differential expression for microarray in this case to estimate the dispersion across samples and conditions and thus try to find differential expression, rather subset better go for differential expression analysis. You need to allow for different levels of variability between genes and between samples, and making statistical conclusions more reliable when the number of samples is small. I would suggest you to go this paper and use it for your effectiveness and then make the plots.

ADD REPLY • link 7.9 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

As the samples are independent, if I do differential expression analysis, do I need to account the independent samples as replicates or not? because you know all the packages for differential expression require replicates for conditions, right?

ADD REPLY • link 7.9 years ago by Ying ▴ 10

0

Entering edit mode

yes you can, you will have 2 conditions where each condition will have their replicates and both in between samples variability and gene was dispersion will be considered. You will anyway normalize the data before proceeding for DE analysis. Try limma and see how it reflects with different normalization methods and how the results change and partition your conditions.