Scatter Plot for two sample condition
1
1
Entering edit mode
6.4 years ago
1769mkc ★ 1.2k

I have data set between two comparison or two condition such as HSC and CMP , so i want to do a scatter plot with along with the regression value. Meanwhile when i m trying to label the sample they are all labelled into same color , i want different color for the labelling so that it can be distinguished

Here is my sample data set

               gene        HSC       CMP
     ENSG00000158292.6  1.8102636  2.456869
     ENSG00000162496.6  2.6796705  6.203838
    ENSG00000117115.10  3.4509115  5.555739
    ENSG00000159423.14  3.6809277  5.063446
     ENSG00000053372.4  5.7089974  6.851090
     ENSG00000127423.8  4.4894292  5.996304
     ENSG00000242125.3 10.6258802 11.715932

Here is my code

library("ggpubr")
ggscatter(data1, x = "HSC", y = "CMP", 
          add = "reg.line", conf.int = FALSE, 
          cor.coef = TRUE, cor.method = "pearson",
          color = "black", size=1)

Any suggestion or help would be highly appreciated

R • 3.3k views
ADD COMMENT
3
Entering edit mode
6.4 years ago

If I understand well you want a scatterplot with a linear regression ?

Edit to take your comment into account i.e. the color variable

using ggplot if data1 is your dataframe :

    ggplot(data1,aes(x=HSC,y=CMP,col=variable)) +
        geom_point() +
        geom_smooth(method = "lm", se = TRUE)
ADD COMMENT
0
Entering edit mode
library(reshape2)

ex <- melt(gendat, id.vars=c("HSC", "CMP"))
ggplot(ex, aes(x=HSC, y=CMP, color=variable)) + 
  geom_point(shape = 20) +
  geom_smooth(method="lm", col = "black",se = TRUE)

I tried that too...

ADD REPLY
0
Entering edit mode

what is the variabl "variable" ? I do not see it in your original question.

ADD REPLY
0
Entering edit mode

The variable is after i melt the data

       HSC       CMP      variable              value
    1.8102636  2.456869  gene               ENSG00000158292.6
     2.6796705  6.203838  gene              ENSG00000162496.6
     3.4509115  5.555739  gene              ENSG00000117115.10
ADD REPLY
0
Entering edit mode

I edited my answer. So each point will be colored relative to the "variable" variable.

ADD REPLY
0
Entering edit mode

Or am I melting the data in a wrong way perhaps ?

ADD REPLY
1
Entering edit mode

Maybe you question is not clear. You want a scatterplot where each point is colored based on a variable (in this case defined in the "variable" column). If yes then my answer should work.

ADD REPLY
0
Entering edit mode

so i wanted that if lets say i have a sample called HSC so all the genes which are from HSC should have different color and same with the other sample .I second code i posted is different from the first one i get the variable column after I melt the dataframe .

ADD REPLY
0
Entering edit mode

I thought all genes were found in the two samples ? You could add an additional column to check if the gene is found in the sample of interest (so > 0).

data1$inHSC <- ifelse(data1$HSC>0,TRUE,FALSE)

and then

ggplot(data1,aes(x=HSC,y=CMP,col=inHSC)) + geom_point() + geom_smooth(method = "lm", se = TRUE)
ADD REPLY
0
Entering edit mode

yes the genes are common to both the samples .

ADD REPLY
1
Entering edit mode

ok so the answer in my comment just above is ok for you ?

ADD REPLY
0
Entering edit mode

yes it works ,is there a way to display R^2 and to add to it lets say i want to label my data point based on the gene names after certain threshold ? For example if the difference between two point is > 1 or 2 I want highlight or label the point

ADD REPLY

Login before adding your answer.

Traffic: 3198 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6