Question: Multiple Correlations Or Anova
2
7.2 years ago by
robjohn7000100
United Kingdom
robjohn7000100 wrote:

Hi,

I have the following data set and I'm not sure about the correct statistical tool to use - multiple correlation/ANOVA? The values in the dataset represent occurrences (%) of certain gene component to show different characteristics (ABCDEFGHIJK) in bacteria under different experimental conditions. My interest is to show whether there is a correlation in the data between any of the different conditions leading to the observations (ABCDEFGHIJK) and how to implement this in R. My problem is which stat to use to convincingly show that because of the gene components in two or more of the conditions which are well correlated with each other, the observations in bacteria were possible.

``````          Cond1    Cond2    Cond3    Cond4    Cond5      Cond6
A    0   1    2    16    17    18
B    1    3     9    23    24    25
C    0    1    16    30    31    32
D    0    0    23    19    20    21
E    0    0    30    26    27    28
F    15    16    1    33    34    35
G    0    0    8    1    2    3
H    0    1    15    8    9    10
I    0    0    22    15    16    17
J    1    2    29    22    23    24
K    0    1    4    5    6    7
``````

Rob

R genetics statistics • 2.4k views
modified 3 months ago by Biostar ♦♦ 20 • written 7.2 years ago by robjohn7000100
0
7.2 years ago by
dariober11k
WCIP | Glasgow | UK
dariober11k wrote:

Hello- If you want to show that there is a correlation between any two conditions, you could calculate all the pairwise correlations and correct the p-values for multiple testing. If your data is in form of percentage, I would either linearize it with arcsine transformation or use a non-parametric test for correlation (e.g. Spearman). Here's a sample R code.

Just a thought...

``````arcsine <- function(x){
return(asin(sign(x) * sqrt(abs(x))))
}

> dat
Cond1 Cond2 Cond3 Cond4 Cond5 Cond6
A     0     1     2    16    17    18
B     1     3     9    23    24    25
C     0     1    16    30    31    32
D     0     0    23    19    20    21
E     0     0    30    26    27    28
F    15    16     1    33    34    35
G     0     0     8     1     2     3
H     0     1    15     8     9    10
I     0     0    22    15    16    17
J     1     2    29    22    23    24
K     0     1     4     5     6     7

nr<- sum(1:(ncol(dat)-1))
dat.cor<- data.frame(condA= rep(NA, nr), condB= rep(NA, nr), cor= rep(NA, nr), pval= rep(NA, nr))

n<- 1
for(i in 1:(ncol(dat)-1)){
for(j in (i+1):ncol(dat)){
dat.cor\$condA[n]<- colnames(dat)[i]
dat.cor\$condB[n]<- colnames(dat)[j]
p<- cor.test(arcsine(dat[,i]/100), arcsine(dat[,j]/100), method= 'p')
dat.cor\$cor[n]<- p\$estimate
dat.cor\$pval[n]<- p\$p.value
n<- n+1
}
}

dat.cor
1  Cond1 Cond2  0.9906504 4.267460e-09 5.120952e-08
2  Cond1 Cond3 -0.4096434 2.108681e-01 1.000000e+00
3  Cond1 Cond4  0.5106776 1.084494e-01 1.000000e+00
4  Cond1 Cond5  0.5106776 1.084494e-01 1.000000e+00
5  Cond1 Cond6  0.5106776 1.084494e-01 1.000000e+00
6  Cond2 Cond3 -0.4574765 1.571276e-01 1.000000e+00
7  Cond2 Cond4  0.5257322 9.671666e-02 1.000000e+00
8  Cond2 Cond5  0.5257322 9.671666e-02 1.000000e+00
9  Cond2 Cond6  0.5257322 9.671666e-02 1.000000e+00
10 Cond3 Cond4  0.2076371 5.401187e-01 1.000000e+00
11 Cond3 Cond5  0.2076371 5.401187e-01 1.000000e+00
12 Cond3 Cond6  0.2076371 5.401187e-01 1.000000e+00
13 Cond4 Cond5  1.0000000 0.000000e+00 0.000000e+00
14 Cond4 Cond6  1.0000000 0.000000e+00 0.000000e+00
15 Cond5 Cond6  1.0000000 0.000000e+00 0.000000e+00
``````

Many thanks for your help Dario. Just a couple of questions:

Are \$estimate and \$p.value from these two lines variables from calling in-built functions:

dat.cor\$cor[n]<- p\$estimate

dat.cor\$pval[n]<- p\$p.value

I'm not sure if my interpretation of the output is right: example as follows:

condA condB cor pval padj 1 Cond1 Cond2 0.9906504 4.267460e-09 5.120952e-08

Considering condA and CondB, the correlation between Cond1 and Cond2 is 0.9906504 with adjusted p-value = 5.120952e-08.

Thanks