Anova P-Values
1
2
Entering edit mode
10.0 years ago
Diana ▴ 900

Hi everyone,

I'm trying to do an ANOVA on normalized microarray data. My input file looks like this:

Probe_ID    control.CEL    sample1.CEL    sample2.CEL    sample3.CEL
AB112960    2.273088453    2.475903455    2.094961932    2.331040798
AF004856    2.466043833    2.809889315    2.257952048    2.492190838


I have replicates for each (control, sample1, sample2, sample3). I have run ANOVA in r using this script:

aovdata <- read.csv("microarray.csv", header = TRUE)
aovdata1 = aovdata[,2:13]
row.names(aovdata1)<-aovdata\$Probe_ID
tissue<-gl(4,3,labels=c("control","aoi","ap","pp"))
aof <- function(x)
{m<-data.frame(tissue, x);
anova(aov(x ~ tissue))}
anovaresults <- apply(aovdata1, 1,aof)
str(anovaresults)


After performing ANOVA, I'm having trouble extracting the Probe Ids along with their p-value. How can I just extract the probe Ids and the corresponding p-value from ANOVA test? When I view summary(anovaresults), it gives me some different kind of information than str(anovaresults) ?

Thanks a lot!!!

r • 6.9k views
0
Entering edit mode

Is there a reason you're not using limma? That would be the more standard way to analyze microarray data and results in more convenient data.frames. Aside from that, anova returns an object, which is being coerced into a vector by as.vector. The apply command then returns a matrix of those vectors. You might just run the anova on a single probe and compare those results before and after sending them through as.vector(). That should give you an idea of how everything is formatted. Personally, I would store the anova results in something and then return some sort of meaningfully formatted output from it. Then, you would have better control over things.

0
Entering edit mode

limma is definitively done for that. It also adds some very useful functionalities like FDR adjustment which might be relevant to use as you run multiple tests. http://www.statsci.org/smyth/pubs/limma-biocbook-reprint.pdf

2
Entering edit mode
10.0 years ago
zx8754 11k

If you are after Pvalues only then try something like this:

aof <- function(x) {
m<-data.frame(tissue, x);
anova(aov(x ~ tissue)))[,"Pr(>F)"]
}


http://stackoverflow.com/questions/3366506/extract-p-value-from-aov

0
Entering edit mode

Thanks a lot!!! but I'm getting this error: Error in summary(anova(aov(x ~ tissue)))[[1]][["Pr(>F)"]] : subscript out of bounds

0
Entering edit mode

I updated the answer, can you try again?

0
Entering edit mode

Thanks. i think there is some problem with the code I have written because the function you've given me works but the output is like this: 0.184097265 0.908233888 0.359941167 0.975710791 0.774837269 NA NA NA NA NA with numbers in 1 row and NAs in 2nd row and its not loading completely. I have 38535 genes and 12 columns only. I don't understand why its doing that. Its not giving me the output as gene IDs in 1st column and p-values in 2nd column

1
Entering edit mode

The NAs are from the "Residuals" row (try running the anova on a single row and you'll see this). Just use [1, "Pr(>F)"] instead, keeping in mind that this could fail if you start making more complicated designs. The function, as written, will never give you gene names as output, since the "anova" function doesn't care about the row names and, thus, won't return them. Perhaps returning 'c(rownames(x), anova(lm(x~tissue))[1,"Pr(>F)"])' would work, though I can't say for sure.

0
Entering edit mode

Thanks !!! it works!