Question

Multiple Correlations Or Anova

2

Entering edit mode

10.8 years ago

robjohn7000 ▴ 110

Hi,

I have the following data set and I'm not sure about the correct statistical tool to use - multiple correlation/ANOVA? The values in the dataset represent occurrences (%) of certain gene component to show different characteristics (ABCDEFGHIJK) in bacteria under different experimental conditions. My interest is to show whether there is a correlation in the data between any of the different conditions leading to the observations (ABCDEFGHIJK) and how to implement this in R. My problem is which stat to use to convincingly show that because of the gene components in two or more of the conditions which are well correlated with each other, the observations in bacteria were possible.

          Cond1    Cond2    Cond3    Cond4    Cond5      Cond6
    A    0   1    2    16    17    18
    B    1    3     9    23    24    25
    C    0    1    16    30    31    32
    D    0    0    23    19    20    21
    E    0    0    30    26    27    28
    F    15    16    1    33    34    35
    G    0    0    8    1    2    3
    H    0    1    15    8    9    10
    I    0    0    22    15    16    17
    J    1    2    29    22    23    24
    K    0    1    4    5    6    7

Please help!

Rob

statistics r genetics • 3.1k views

ADD COMMENT • link updated 3.9 years ago by Biostar 20 • written 10.8 years ago by robjohn7000 ▴ 110

score 0 · Answer 1 · 2013-07-13

Hello- If you want to show that there is a correlation between any two conditions, you could calculate all the pairwise correlations and correct the p-values for multiple testing. If your data is in form of percentage, I would either linearize it with arcsine transformation or use a non-parametric test for correlation (e.g. Spearman). Here's a sample R code.

Just a thought...

arcsine <- function(x){
    return(asin(sign(x) * sqrt(abs(x))))
}

dat<- read.table('dat.txt', header= TRUE, row.names= 1, sep= '\t')
> dat
  Cond1 Cond2 Cond3 Cond4 Cond5 Cond6
A     0     1     2    16    17    18
B     1     3     9    23    24    25
C     0     1    16    30    31    32
D     0     0    23    19    20    21
E     0     0    30    26    27    28
F    15    16     1    33    34    35
G     0     0     8     1     2     3
H     0     1    15     8     9    10
I     0     0    22    15    16    17
J     1     2    29    22    23    24
K     0     1     4     5     6     7

nr<- sum(1:(ncol(dat)-1))
dat.cor<- data.frame(condA= rep(NA, nr), condB= rep(NA, nr), cor= rep(NA, nr), pval= rep(NA, nr))

n<- 1
for(i in 1:(ncol(dat)-1)){
    for(j in (i+1):ncol(dat)){
        dat.cor$condA[n]<- colnames(dat)[i]
        dat.cor$condB[n]<- colnames(dat)[j]
        p<- cor.test(arcsine(dat[,i]/100), arcsine(dat[,j]/100), method= 'p')
        dat.cor$cor[n]<- p$estimate
        dat.cor$pval[n]<- p$p.value
        n<- n+1
    }
}
dat.cor$padj<- p.adjust(dat.cor$pval, method= 'holm')

dat.cor
   condA condB        cor         pval         padj
1  Cond1 Cond2  0.9906504 4.267460e-09 5.120952e-08
2  Cond1 Cond3 -0.4096434 2.108681e-01 1.000000e+00
3  Cond1 Cond4  0.5106776 1.084494e-01 1.000000e+00
4  Cond1 Cond5  0.5106776 1.084494e-01 1.000000e+00
5  Cond1 Cond6  0.5106776 1.084494e-01 1.000000e+00
6  Cond2 Cond3 -0.4574765 1.571276e-01 1.000000e+00
7  Cond2 Cond4  0.5257322 9.671666e-02 1.000000e+00
8  Cond2 Cond5  0.5257322 9.671666e-02 1.000000e+00
9  Cond2 Cond6  0.5257322 9.671666e-02 1.000000e+00
10 Cond3 Cond4  0.2076371 5.401187e-01 1.000000e+00
11 Cond3 Cond5  0.2076371 5.401187e-01 1.000000e+00
12 Cond3 Cond6  0.2076371 5.401187e-01 1.000000e+00
13 Cond4 Cond5  1.0000000 0.000000e+00 0.000000e+00
14 Cond4 Cond6  1.0000000 0.000000e+00 0.000000e+00
15 Cond5 Cond6  1.0000000 0.000000e+00 0.000000e+00