correlation between expression and tumor size
2
0
Entering edit mode
3.7 years ago
newbie ▴ 120

I have the following information in a dataframe data. First two columns are expression of two genes and third column is tumour size.

data:

structure(list(KRAS = c(1.84799690655495, 0.485426827170242, 
1.58496250072116, 0.925999418556223, 2.82781902461732, 1.53605290024021, 
1.37851162325373, 1.0703893278914, 0.765534746362977, 1.63226821549951, 
3.13750352374994, 0.84799690655495, 1, 0.137503523749935, 0.37851162325373, 
2.98185265328974, 1.13750352374994, 4.54225804976692, 1.53605290024021, 
4.53605290024021, 4.12928301694497, 0.84799690655495, 1.13750352374994, 
2.60880924267552, 1.13750352374994, 0.678071905112638, 0.765534746362977, 
0.84799690655495, 4.93545974780529, 0.584962500721156, 1.8073549220576, 
2.16992500144231, 1.13750352374994, 1.53605290024021, 1.32192809488736, 
1.72246602447109, 3.40599235967584, 1.72246602447109, 2.20163386116965, 
2.58496250072116, 0.584962500721156, 0.925999418556223, 1.0703893278914, 
1.37851162325373, 2.58496250072116, 0.765534746362977, 1.43295940727611, 
1.48542682717024, 2, 3.83794324189103), HRAS = c(2.88752527074159, 
2.88752527074159, 2.10433665981474, 0.925999418556223, 4.54843662469604, 
3.33628338786443, 3.30742852519225, 3.32192809488736, 1.58496250072116, 
4.41278152533848, 4.20945336562895, 3.92599941855622, 2.51096191927738, 
1.84799690655495, 3, 1.8073549220576, 3.01792190799726, 5.24412594328373, 
1.88752527074159, 5.6409679104499, 5.02680005934372, 3.877744249949, 
1.88752527074159, 2.13750352374994, 1.67807190511264, 0.925999418556223, 
2.48542682717024, 3.26303440583379, 5.95419631038687, 4.12928301694497, 
3.47248777146274, 3.91647664443772, 2.26303440583379, 3.96347412397489, 
0.678071905112638, 2.56071495447448, 4.65535182861255, 3.20163386116965, 
3.12101540096137, 3.62058641045188, 2.56071495447448, 1.32192809488736, 
1.84799690655495, 4.62643913669732, 3.91647664443772, 2.0703893278914, 
1.37851162325373, 1.48542682717024, 3.85798099512757, 4.12101540096137
), tumsize = c("6.5", "3", "3.5", "2.8", "1.3", "3.4", "2.4", 
"3.5", "5.7", "3.7", "4.5", "1.4", "3.6", "3.5", "5.5", "3", 
"3.4", "1.5", "5", "3", "1.7", "1.5", "1", "2.5", "3.3", "2.6", 
"1", "2.6", "0.5", "1.5", "2.5", "1.5", "2.3", "1.5", "3.6", 
"4.5", "3", "1.5", "4", "1.5", "2", "4", "5", "4.5", "2", "2.4", 
"2.5", "2.9", "5.2", "1.7")), row.names = c(1L, 2L, 3L, 4L, 5L, 
7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 
20L, 21L, 22L, 23L, 25L, 26L, 28L, 33L, 34L, 35L, 36L, 37L, 38L, 
39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 
52L, 53L, 54L, 55L, 56L, 57L), class = "data.frame")

Correlation between two genes expression data was done using spearman correlation.

ggscatter(data, x = "KRAS", y = "HRAS", 
          size = 0.3,combine = TRUE, ylab = "HRAS",
          palette = "jco",add = "reg.line", conf.int = TRUE) + 
  stat_cor(method = "spearman")

enter image description here

And I'm interested in looking at correlation between KRAS expression and Tumor size. But I don't get the right way. Is something wrong in this?

ggscatter(data, x = "KRAS", y = "tumsize", 
          size = 0.3,combine = TRUE, ylab = "Tumor size",
          palette = "jco",add = "reg.line", conf.int = TRUE) + 
  stat_cor(method = "spearman")

`geom_smooth()` using formula 'y ~ x'
There were 18 warnings (use warnings() to see them)

And it looks like below. Can anyone tell me how to check correlation between expression and tumor size?

enter image description here

R expression tumorsize correlation • 747 views
ADD COMMENT
2
Entering edit mode
3.7 years ago

It is not stated in your question what is your exact intention or where lies the problem (?).

First, you need to convert tumsize to numeric:

data$tumsize <- as.numeric(data$tumsize)

For correlation, you just need:

cor(x = data[['KRAS']], y = data[['tumsize']], method = 'spearman')
cor.test(x = data[['KRAS']], y = data[['tumsize']], method = 'spearman')

You can also build a linear regression model:

model <- lm(tumsize ~ KRAS, data = data)
summary(model)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2244 -0.8217 -0.2413  0.6913  3.5984 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.4527     0.3467   9.959  2.9e-13 ***
KRAS         -0.2982     0.1654  -1.803   0.0776 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.345 on 48 degrees of freedom
Multiple R-squared:  0.06344,   Adjusted R-squared:  0.04393 
F-statistic: 3.251 on 1 and 48 DF,  p-value: 0.07764

Odds ratios:

exp(cbind('OR' = coef(model), confint.default(model, level = 0.95)))

                    OR      2.5 %    97.5 %
(Intercept) 31.5853430 16.0098441 62.313779
KRAS         0.7421502  0.5366864  1.026273

Kevin

ADD COMMENT
0
Entering edit mode

Oh yes. Wondering how come I didn't check this str(data). thanks a lot.

ADD REPLY
1
Entering edit mode
3.7 years ago

Hi, it looks like you are defining tumour size as character and not a numerical.

If you do str(data) you will see types of all variables in your dataframe

what you could do is to change tumsize in dataframe directly like data$tumsize <- as.numeric(data$tumsize)

ADD COMMENT

Login before adding your answer.

Traffic: 1556 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6