Question: scatter plot in ggplot2 two colour for two different condition
0
gravatar for krushnach80
10 months ago by
krushnach80440
krushnach80440 wrote:

Im using two condition wild type and treated I get a scatter but i want to have two different colors Im not sure how to do it , I tried to group them by melting i got aesthetics error so i removed the gene column only i have two column but the data points are labelled as same

df <- HL_60
#gene = df[,1]
WT = df[,2:3]
ATRA = df[,4:5]
wt.mean = apply(WT,1,mean)
atra.mean = apply(ATRA,1,mean)
WT_ATRA = cbind.data.frame(wt.mean,atra.mean)
head(WT_ATRA)



library(ggplot2)
library(ggpmisc)
D1 <-WT_ATRA
head(D1)
my.formula  <- y ~ x

p <- ggplot(data = WT_ATRA, aes(x= wt.mean,y = atra.mean) )+ geom_point(color = 'red',size = .9)+ 
  geom_smooth(method = "lm", se=FALSE, color="black", formula = my.formula) +
  stat_poly_eq(formula = my.formula,
               eq.with.lhs = "italic(hat(y))~`=`~",
               aes(label = paste(..eq.label.., ..rr.label.., sep = "*plain(\",\")~")), 
               parse = TRUE)

The figure i get is this one

Any suggestion or help how to put two colour for two different condition would be highly appreciated

R • 2.0k views
ADD COMMENTlink modified 10 months ago by arup720 • written 10 months ago by krushnach80440

I would add a column called condition (with values WT or treated) and melt the dataframe, then colour by the condition should give you the two conditions in two separate colours.

Please don't follow this as it is not possible with this plot, see the discussion below. Apologies for recommending this before looking at the data.

ADD REPLYlink modified 10 months ago • written 10 months ago by Sej Modha3.9k

well I would be glad if you could show me in my code , which i posted because I did earlier when i melt i get I get one coloumn for gene ,one column variable and the last one value..but I still couldn;t figure it out

ADD REPLYlink modified 10 months ago • written 10 months ago by krushnach80440

am I not doing the same thing when Im binding the column one is for my wild type and the other one is treated...

ADD REPLYlink modified 10 months ago • written 10 months ago by krushnach80440
1

Can you post some example data that resembles HL_60?

ADD REPLYlink written 10 months ago by Sej Modha3.9k
okay so my data set is as such small set of it 


gene         WT1       WT2           AT1            AT2             VD1             VD2
    ENSG00000227232.5   5.2822087357    6.4447483588    6.8860571504    6.9411803286    5.3968150313    6.4528522014
    ENSG00000278267.1   3.5858305786    3.6836795858    3.5523112   3.5474228185    3.7568282659    2.9090525017
    ENSG00000238009.4   2.22313652  2.3074139286    2.6703264597    1.6500091151    2.1942827694    0.7234491107
    ENSG00000233750.3   1.1525240028    1.7527357273    2.6703264597    3.2560180286    2.6701449288    1.8497987198
    ENSG00000269981.1   0   0   2.8343603766    3.2560180286    1.4793911805    0
    ENSG00000241860.4   2.5587816592    2.8718278554    4.9846498052    5.5538107754    3.5521141942    3.8751528623
    ENSG00000241599.1   0   1.7527357273    2.6703264597    3.5474228185    1.4793911805    0


have a look and would be glad to get your suggestion

My first column is my gene rest are my sample first one is my control wild type rest are treated so im making pairwise comparison

ADD REPLYlink modified 10 months ago • written 10 months ago by krushnach80440
2

So, what should colour represent? I honestly don't get it.

ADD REPLYlink written 10 months ago by e.rempel710

well to me what i need is the R^2 value but to my boss he needs that the sample labelled to show that it shows that the data points are coming from two different condition I would be glad it you can solve my woe

ADD REPLYlink written 10 months ago by krushnach80440
1

Hi krushnach80. I believe that your supervisor does not understand the plot.

What you are plotting cannot be colour-coded based on 2 different conditions because the values in your plot are summarising a difference between both conditions. This just a simple scatter plot comparing the mean in disease versus WT.

You could colour the dots in a gradient fashion based on the intensity of the mean.

ADD REPLYlink written 10 months ago by Kevin Blighe33k

yeah I m just plotting the replicates just to show that when there is induction with ATRA and VD3 the R^2 decreases to prove that yes after induction there is a change in the expression ,but do you have any other way to show the difference using scatter plot

ADD REPLYlink written 10 months ago by krushnach80440

what i understad is he want those coming from WILD type should have one colour and the one coming from ATRA treatment a different color because thats how he got scatter plot when he did it using seqmonk tool ,I m trying to do the same

ADD REPLYlink written 10 months ago by krushnach80440
1

...but, if you wanted to do that, then each dot would have 2 colours because each dot represents both WT (value on x-axis) and ATRA (value on y-axis).

You probably mean something like this: https://www.bioinformatics.babraham.ac.uk/projects/seqmonk/Help/3%20Visualisation/3.2%20Figures%20and%20Graphs/scatter_plot_sublists.png

In that plot, only certain dots are shaded, presumably those that have passed some other statistical test like FDR-corrected P values or fold-change, somewhat akin to a volcano plot.

ADD REPLYlink written 10 months ago by Kevin Blighe33k

yeah exactly like that seqmonk plot and for this yes " then each dot would have 2 colours because each dot represents both WT (value on x-axis) and ATRA (value on y-axis)." because a I have same number of gene for both the condition WT and ATRA...I played around all kind of combination to put two different colour for WT and ATRA for same gene as they have different values

And I have given my small data sets

ADD REPLYlink modified 10 months ago • written 10 months ago by krushnach80440
2

To give you an example, this code will colour red (firebrick1) any genes that have linear fold change >= 4.0 in ATRA, and colour green (forestgreen) any genes with linear fold-change <= -4.0

Not sure if this helps (or even works). You can play around with the cut-offs in order to choose how you want to shade the points based on different statistical cut-off thresholds.

WT_ATRA$significance <- "NS"
WT_ATRA$significance[((WT_ATRA$atra.mean / WT_ATRA$wt.mean) >= 4.0)] <- "Up"
WT_ATRA$significance[((WT_ATRA$atra.mean / WT_ATRA$wt.mean) <= -4.0 )] <- "Down"
WT_ATRA$significance <- factor(WT_ATRA$significance, levels=c("NS","Up","Down"))

ggplot(data = WT_ATRA, aes(x=wt.mean, y=atra.mean) ) + 

        geom_point(aes(color=significance), alpha=1/2, size=0.8) +

        scale_color_manual(values=c("grey", "firebrick1", "forestgreen")) +

                ....
ADD REPLYlink modified 10 months ago • written 10 months ago by Kevin Blighe33k

I will try your code and it will work definitely , but I have to explain my boss about the plot...It seems he doesn't understand

ADD REPLYlink written 10 months ago by krushnach80440
1

Okay, I will be your new boss

ADD REPLYlink written 10 months ago by Kevin Blighe33k

glad i could have that opportunity may be in future

ADD REPLYlink written 10 months ago by krushnach80440
2

Just hook your boss up on biostars ;)

ADD REPLYlink modified 10 months ago • written 10 months ago by e.rempel710
4
gravatar for arup
10 months ago by
arup720
India
arup720 wrote:

The plot is in a 2D plane which means each point has two values associated with it (x,y) in this plane. in your case, each point is a gene and has (wt_mean,atra_mean) as coordinate. So, I don't get the point of coloring on the basis of wt_mean and atra_mean. What are you trying to infer?

The plot you mentioned in the comment using fold change as a factor of color code which can be achieved with the following code in ggplot2.

  geom_point(aes(colour = cut(log2fc, c(-5, -1, 1, 5))),size = 2)+
  scale_color_manual(name = "Fold Change",
                     values = c("(1, 5]" = "#FF6666",
                                "(-1,1]" = "#F5F5F5",
                                "(-5,-1]" = "#60B9FF"),
                     labels = c("<-2", "-2< & <2", ">2"))

log2fc is the log2 converted foldchnage information which is divied into three different bins (-5,-1), (-1,1),(1,5). Change them according to your rquirements.

ADD COMMENTlink written 10 months ago by arup720

okay I can directly put the FC into it instead of again doing over the samples

ADD REPLYlink written 10 months ago by krushnach80440
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1052 users visited in the last hour