Question: Troubleshooting differential expression with limma
3.2 years ago by
I am having some troubleĀ  with the output of my differential gene expression analysis with limma. The problem I run into is that the results of my analysis is that all probes are statistically significant with both P-value and adjusted P-value.

I have been given an RMA normalized data set. The file i was provided was set up as a csv file. An example is as follows:

            sample1group1   sample2group1   sample3group2   sample4group2
probeid_1   7.8165          6.7145          3.142           2.495
probeid_2   6.4586          4.2135          1.245           5.325
probeid_3   4.5241          4.2111          4.456           7.415

So, I chose to use the Bioconductor limma package to perform differential analysis between group 1 and group 2 of samples. Following the manual, I set up the following R script:

df = read.table(above_table_filename, sep="\t", header=TRUE)
design = c(0, 0, 1, 1) # set 0 for group1 and 1 for group2
row.names(df) <- df$X # set row names
df <- df[-1] # remove the first column now that row.names has been set
df <- df[-(1:n),] # remove the control probes
num_rows <- nrow(df)
fit <- lmFit(df, design) # initialize the limma fit
fit <- eBayes(fit) # recommended Bayesian fit
options(digits=3) # set sig figs
# Provides table of gene ids sorted by P-value
topTable(fit, adjust="fdr","P", number=num_rows)

Did I do something wrong in the set up of this analysis? The manual had their example of a design matrix with -1 and 1, but I do not know if that would make a big difference. Otherwise, I am not sure what I set up incorrectly with the data. I have been given a number of groups to test, and all groups result in the same highly significant values for all probes. So, it is not something particular to this grouping.

Edit: Just to be clear, I want to find the most highly differentially expressed genes between the two groups of samples. So whatever genes (on average or median) b/n group 1 are higher/lower than group 2.

Edit2: I should mention that my columns in the actual file are not neatly organized as above. The groups are mixed randomly between the columns, so if 0 = group1 and 1 = group2, then I would have a vector like (as an example):

c(1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0)

Any good way to adjust for this when making the design table, if I made it incorrectly.

