Heatmap Using R, With Special Conditions !!!
3
2
Entering edit mode
12.1 years ago
RDS ▴ 20

This dataset is for a specific disease-gene-test results. The dataset goes like this.

| Test | Gene | Relevance | Values |

Test and Gene are the two parameters on x and y axis. Values are the combined results for the parameters pair. The problem for me here is that I have one more parameter called Relevance which represents the relevance of the test-gene pair and it is boolean (only two values YES/NO).

  • The dataset (Relevance) should be differentiated in the map with different colour (like red and green).
  • Gradient of that colour represents numerical values between that interaction.

The end result I was aiming at was, Test on x-axis and Gene on y-axis and the map for these interaction will be only with two colours (representing Relevance values) and the gradient of that colour representing Values. Is this possible to achieve this kind of Heatmap, if Yes how can I achieve. If not is there any other option to display such kind of data (similar to heatmap)

Help appreciated !!

Thanks,

RDS

something like this - something like this

r heatmap • 19k views
ADD COMMENT
5
Entering edit mode

In my lab, 3 people out of 60 are color-blind. This figure would be unintelligible for them.

ADD REPLY
4
Entering edit mode

ColorBrewer (http://colorbrewer2.org) is an excellent resource for choosing color schemes for scientific data, and addresses issues like color blindness.

ADD REPLY
3
Entering edit mode

I'm red/green blind and I can perfectly interpret the figure.

ADD REPLY
3
Entering edit mode
12.1 years ago
bdemarest ▴ 460

Here is some R code that may help. I must admit I do not understand exactly how your Relevance and Value variables are related (or expected to interact). I have made some guesses. If I have guessed wrong, perhaps you can post some sample data to help clear up my confusion?

I have used scale_fill_gradient2 in these examples. You can specify three different colors: high, mid (default is white), low. You can specify the midpoint (value that maps to mid color, default=0), and upper and lower value limits. This may provide enough flexibility to show your data the way you want it.

In Example 1, negative values range from red to white, and positive values range from white to blue. In Example 2, where relevance is FALSE, no color is plotted and where relevance is TRUE, the color gradient spans the range red-white-blue.

library(ggplot2)

# Create test data.
dat1 = data.frame(x=factor(rep(c("A", "B", "C"), 3)), 
                  y=factor(rep(c(37, 8.7, -17.7), c(3, 3, 3))), 
                  z=c(34, 18, 31, 9, -2, 4, -21, -33, -13))

p1 = ggplot(dat1, aes(x=x, y=y, fill=z)) +
     theme_bw() +
     geom_tile() +
     geom_text(aes(label=paste(z))) +
     scale_fill_gradient2(midpoint=0, low="#B2182B", high="#2166AC") +
     opts(title="Example 1")

ggsave(plot=p1, filename="plot_1.png", height=4.5, width=5)

http://dl.dropbox.com/u/15656938/plot<em>1</em>20121105.png

# Create a slightly different test dataset.
dat2 = data.frame(
         gene=factor(rep(c("Gene_A", "Gene_B", "Gene_C"), 3)), 
         test=factor(rep(c("Test_1", "Test_2", "Test_3"), c(3, 3, 3))),
         relevance=c(TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE), 
         value=c(-16, 3, NA, NA, -13, 25, -4, NA, -26))

p2 = ggplot(dat2, aes(x=gene, y=test, fill=value)) +
     theme_bw() +
     geom_tile() +
     geom_text(aes(label=paste(value))) +
     scale_fill_gradient2(midpoint=0, low="#B2182B", high="#2166AC") +
     opts(title="Example 2")

ggsave(plot=p2, filename="plot_2.png", height=4.5, width=5)

http://dl.dropbox.com/u/15656938/plot<em>2</em>20121105.png

ADD COMMENT
0
Entering edit mode

Yes !! I can understand. Check the below data and the conditions.

Sample Dataset:

TEST      UNIPROT_ID  VALUES    RELEVANCE
16847    P12821   0.150202199      YES
10964    P00918   0.289074042      YES
36315    P41145   0.203689575      NO
55033    P43088   0.183951524      NO
80965    P47869   0.156262678      YES
27639    P06276   0.130653334      YES
17170    Q72874   0.112942393      NO
15451    P25101   0.162308309      YES
27370    P04150   0.183241007      NO
27370    P04150   0.132568467      YES
27370    P10276   0.183241585      NO
39857    P00918   0.302647449      YES
33216    P10276   0.192524252      NO

Conditions:

TEST and UNIPROTID are the parameters for x & y-axis. VALUES are for those pairs - which should be on the map represented by colour gradient. RELEVANCE is the correctness of the TEST and UNIPROTID pair.

There might be case where data in TEST and UNIPROT_ID might be same and data in VALUES are different which are differentiated by RELEVANCE data.

Example:

33216    P10276   0.192524252      NO
33216    P10276   0.126589451      YES

So to differentiate these values I need to assign a specific colour to RELEVANCE and it's gradient to represent those VALUES. Other important thing is, The colour gradient should increase with increase in VALUE.

An example for your reference: Assigned Red colour for RELEVANCE - NO data and the red colour density depends on the VALUES for that data.

FOR THE EXAMPLES SPECIFIED BY YOU

My dataset doesn't have negative values. So I cannot use gradient of two colours pointing one end for negative and the other for positive.

Example 2: The idea of "no plot colour" and text 'NA' is good. I will use that.

Appreciates,

RDS

ADD REPLY
2
Entering edit mode
12.1 years ago

I am sure you can do it in R, a little bit fancy using the ggplot2 library and having your data in the form of dataframe. Check these two posts on how to achieve it. ggplot2-quick-heatmap-plotting and Constructing an Heatmap of "Distance of binding region relative to TSS".

Cheers

ADD COMMENT
0
Entering edit mode

Thanks for the response, but that's not what I was looking at. Let me explain - In the links which you've provided, the colour differentiation is done based on the data relative to just one axis. That means in this image http://i.stack.imgur.com/dPAE2.png colours (red, green and blue) are relative to data on y-axis (these colours are part of data which represents x-axis).

But the data set which I have is different, let me explain this. | Test | Gene | Relevance | Values | Relevance is dataset which represents both test and gene pair. On certain conditions for a test/gene pair relevance may be Yes/No. (It's not a part of any axis). I hope you've understood the problem. Appreciates, RDS

ADD REPLY
0
Entering edit mode

So, you want to have this heatmap but the color should be a representative of the boolean in Relevance. Is this is correct, then just point the fill variable of geom_tile to Relevance, after melting instead of rescale and if you want numbers on top, then you will have to use geom_text in addition. :)

ADD REPLY
0
Entering edit mode

Yeah, I can get colours as you said "fill variable of geomtile to Relevance". I am fine till here but, how can I use the fourth data point (of this | Test | Gene | Relevance | Values |) VALUES ? not as text (using geomtext). How can I make the gradient of those colours using VALUES data points ?

RDS

ADD REPLY
0
Entering edit mode

Aahh, then you might have to do some tweaking, Assign -ve values to the elements with Relevance=No and +ve to the Relevance=Yes, and then fill using values, more the -ve, the more its not relevant. So, make the subset, where the Relevance=No, add - to the value column and then plot. After that, generate the gradient as described here. How to do gradient

ADD REPLY
1
Entering edit mode
12.1 years ago

I tried using lattice in R to do roughly the same thing and I got close using latticeplot routines. It is probably better to use ggplot2 for this task.

ADD COMMENT

Login before adding your answer.

Traffic: 778 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6