Heatmap of enriched GO terms in R
1
0
Entering edit mode
5.0 years ago

Hi,

I have done an RNAseq analysis. Based on the |LFC|>1 and FDR<0.05, I selected differentially transcribed genes between the control and treated samples (I have 5 different treatments). I used those genes to do a GO enrichment analyses in BiNGO. This was done separately for every conditions. Based on the p-values of the significant over-represented GO categories, I now want to make a heatmap. I already succeeded in making heatmaps in R using heatmap.2. However, the problem is that for a lot of GO categories I have no p-value since this GO term was not enriched in those conditions. My first thought was to put a p-value of 1 for those GO terms. However, I think that this will influence the clustering that is done by heatmap.2. I have already been searching some time for a good solution how to ignore the NA values that are present in my table, but I can not find it. So what is the best way to handle the NA data in my table? Is it possible to do the clustering in heatmap.2 while ignoring the NA values? Or is it correct to put a p-value of 1? Or is it better to use another parameter to cluster instead of using p-values?

Here is a part of the table I used for the input in R:

GO cond1 cond2 cond3 cond4 cond5

aging 3.69E-02 3.76E-08 2.18E-02 3.17E-04 1.43E-06

amino.acid.import NA NA NA 2.16E-03 1.56E-02

amino.sugar.metabolic.process NA 1.49E-02 2.53E-03 2.78E-01 6.24E-02

calcium.ion.homeostasis NA 4.68E-02 NA 3.26E-03 NA

carboxylic.acid.transmembrane.transport3.66E-01 NA 3.30E-01 4.12E-07 2.38E-05

cell.death 3.94E-02 1.10E-02 6.09E-03 NA 1.02E-03

cell.development NA NA NA 3.81E-02 5.87E-01

cell.growth NA NA NA 1.63E-02 NA

cell.wall.modification NA NA NA 5.21E-04 7.35E-03

R RNA-Seq • 2.9k views
0
Entering edit mode

I don't know what code you tried...

But you can try to put the following in it (assuming you use heatmap.2):

na.color="blue"

0
Entering edit mode
5.0 years ago

A hierarchical clustering can be made by various approaches where you first calculate a distance matrix (which you can use different approaches such as euclidean distance, 1-pearson correlation, etc.) and then choose a clustering method (complete, ward, etc.).

In your case it would be better to include the actual p-values from GO even if they are not significant and then do the clustering. You can obtain these p-values by using GprofileR or other R packages that do GO. But if you decide not to do so, using 1 for non-significant p-values would not "bias" your clustering.

And I hope your p-values are corrected for multiple testing.