Hi
I am having some trouble with the output of my differential gene expression analysis with limma. The problem I run into is that the results of my analysis is that all probes are statistically significant with both P-value and adjusted P-value.
I have been given an RMA normalized data set. The file I was provided was set up as a csv file. An example is as follows:
sample1group1 sample2group1 sample3group2 sample4group2
probeid_1 7.8165 6.7145 3.142 2.495
probeid_2 6.4586 4.2135 1.245 5.325
probeid_3 4.5241 4.2111 4.456 7.415
...
So, I chose to use the Bioconductor limma package to perform differential analysis between group 1 and group 2 of samples. Following the manual, I set up the following R script:
library(limma)
df = read.table(above_table_filename, sep="\t", header=TRUE)
design = c(0, 0, 1, 1) # set 0 for group1 and 1 for group2
row.names(df) <- df$X # set row names
df <- df[-1] # remove the first column now that row.names has been set
df <- df[-(1:n),] # remove the control probes
num_rows <- nrow(df)
fit <- lmFit(df, design) # initialize the limma fit
fit <- eBayes(fit) # recommended Bayesian fit
options(digits=3) # set sig figs
# Provides table of gene ids sorted by P-value
topTable(fit, adjust="fdr", sort.by="P", number=num_rows)
Did I do something wrong in the set up of this analysis? The manual had their example of a design matrix with -1 and 1, but I do not know if that would make a big difference. Otherwise, I am not sure what I set up incorrectly with the data. I have been given a number of groups to test, and all groups result in the same highly significant values for all probes. So, it is not something particular to this grouping.
Edit: Just to be clear, I want to find the most highly differentially expressed genes between the two groups of samples. So whatever genes (on average or median) b/n group 1 are higher/lower than group 2.
Edit2: I should mention that my columns in the actual file are not neatly organized as above. The groups are mixed randomly between the columns, so if 0 = group1 and 1 = group2, then I would have a vector like (as an example):
c(1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0)
Any good way to adjust for this when making the design table, if I made it incorrectly.