Question: Multiple Correlations Or Anova
2
gravatar for robjohn7000
6.2 years ago by
robjohn700080
United Kingdom
robjohn700080 wrote:

Hi,

I have the following data set and I'm not sure about the correct statistical tool to use - multiple correlation/ANOVA? The values in the dataset represent occurrences (%) of certain gene component to show different characteristics (ABCDEFGHIJK) in bacteria under different experimental conditions. My interest is to show whether there is a correlation in the data between any of the different conditions leading to the observations (ABCDEFGHIJK) and how to implement this in R. My problem is which stat to use to convincingly show that because of the gene components in two or more of the conditions which are well correlated with each other, the observations in bacteria were possible.

          Cond1    Cond2    Cond3    Cond4    Cond5      Cond6
    A    0   1    2    16    17    18
    B    1    3     9    23    24    25
    C    0    1    16    30    31    32
    D    0    0    23    19    20    21
    E    0    0    30    26    27    28
    F    15    16    1    33    34    35
    G    0    0    8    1    2    3
    H    0    1    15    8    9    10
    I    0    0    22    15    16    17
    J    1    2    29    22    23    24
    K    0    1    4    5    6    7

Please help!

Rob

R genetics statistics • 2.1k views
ADD COMMENTlink modified 6.2 years ago by dariober10k • written 6.2 years ago by robjohn700080
0
gravatar for dariober
6.2 years ago by
dariober10k
WCIP | Glasgow | UK
dariober10k wrote:

Hello- If you want to show that there is a correlation between any two conditions, you could calculate all the pairwise correlations and correct the p-values for multiple testing. If your data is in form of percentage, I would either linearize it with arcsine transformation or use a non-parametric test for correlation (e.g. Spearman). Here's a sample R code.

Just a thought...

arcsine <- function(x){
    return(asin(sign(x) * sqrt(abs(x))))
}

dat<- read.table('dat.txt', header= TRUE, row.names= 1, sep= '\t')
> dat
  Cond1 Cond2 Cond3 Cond4 Cond5 Cond6
A     0     1     2    16    17    18
B     1     3     9    23    24    25
C     0     1    16    30    31    32
D     0     0    23    19    20    21
E     0     0    30    26    27    28
F    15    16     1    33    34    35
G     0     0     8     1     2     3
H     0     1    15     8     9    10
I     0     0    22    15    16    17
J     1     2    29    22    23    24
K     0     1     4     5     6     7

nr<- sum(1:(ncol(dat)-1))
dat.cor<- data.frame(condA= rep(NA, nr), condB= rep(NA, nr), cor= rep(NA, nr), pval= rep(NA, nr))

n<- 1
for(i in 1:(ncol(dat)-1)){
    for(j in (i+1):ncol(dat)){
        dat.cor$condA[n]<- colnames(dat)[i]
        dat.cor$condB[n]<- colnames(dat)[j]
        p<- cor.test(arcsine(dat[,i]/100), arcsine(dat[,j]/100), method= 'p')
        dat.cor$cor[n]<- p$estimate
        dat.cor$pval[n]<- p$p.value
        n<- n+1
    }
}
dat.cor$padj<- p.adjust(dat.cor$pval, method= 'holm')

dat.cor
   condA condB        cor         pval         padj
1  Cond1 Cond2  0.9906504 4.267460e-09 5.120952e-08
2  Cond1 Cond3 -0.4096434 2.108681e-01 1.000000e+00
3  Cond1 Cond4  0.5106776 1.084494e-01 1.000000e+00
4  Cond1 Cond5  0.5106776 1.084494e-01 1.000000e+00
5  Cond1 Cond6  0.5106776 1.084494e-01 1.000000e+00
6  Cond2 Cond3 -0.4574765 1.571276e-01 1.000000e+00
7  Cond2 Cond4  0.5257322 9.671666e-02 1.000000e+00
8  Cond2 Cond5  0.5257322 9.671666e-02 1.000000e+00
9  Cond2 Cond6  0.5257322 9.671666e-02 1.000000e+00
10 Cond3 Cond4  0.2076371 5.401187e-01 1.000000e+00
11 Cond3 Cond5  0.2076371 5.401187e-01 1.000000e+00
12 Cond3 Cond6  0.2076371 5.401187e-01 1.000000e+00
13 Cond4 Cond5  1.0000000 0.000000e+00 0.000000e+00
14 Cond4 Cond6  1.0000000 0.000000e+00 0.000000e+00
15 Cond5 Cond6  1.0000000 0.000000e+00 0.000000e+00
ADD COMMENTlink written 6.2 years ago by dariober10k

Many thanks for your help Dario. Just a couple of questions:

Are $estimate and $p.value from these two lines variables from calling in-built functions:

dat.cor$cor[n]<- p$estimate

dat.cor$pval[n]<- p$p.value

I'm not sure if my interpretation of the output is right: example as follows:

condA condB cor pval padj 1 Cond1 Cond2 0.9906504 4.267460e-09 5.120952e-08

Considering condA and CondB, the correlation between Cond1 and Cond2 is 0.9906504 with adjusted p-value = 5.120952e-08.

Thanks

ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by robjohn700080

Hi- Sorry I couldn't reply before. I guess by now you figured it out... Anyway... Yes, p$estimate and p$p.value come from the output of cor.test. And yes again your interpretation of the columns in dat.cor is correct.

ADD REPLYlink written 6.2 years ago by dariober10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1598 users visited in the last hour