Question: Limma strange low p-values.
1
gravatar for lucasecardozo
4.3 years ago by
lucasecardozo10 wrote:

I'm analysing a microarray (single-channel) study with 2 experimental groups, each one with only 2 samples. I wanted to compare the two groups in order to find DE genes, so, I ran limma analysis on this dataset. The results were weird...lots of genes w/ very low p-values (lowest p-value: 6.144e-213). I made a volcano plot to see what was happening with the data...got this weird volcano plot: http://s33.postimg.org/mxnx39j67/weird_volcano.png

Do you guys know what kind of stuff would lead to that?

>print(annot)
     GEO_ID      group
1 GSM321605 uninfected
2 GSM321606 uninfected
3 GSM321607   infected
4 GSM321608   infected

>design <- model.matrix(~0 +annot[, "group"])
>head(design)
  annot[, "group"]infected annot[, "group"]uninfected
1                        0                          1
2                        0                          1
3                        1                          0
4                        1                          0

>colnames(design) <- c("infected", "uninfected")
>cm <- makeContrasts(InfvsUninf = infected-uninfected, levels=design)
>print(cm)
            Contrasts
Levels       InfvsUninf
  infected            1
  uninfected         -1

>fit <- lmFit(exprs, design)
>fit2 <- contrasts.fit(fit, cm)
>fit2 <- eBayes(fit2)
>results <- topTable(fit2, "InfvsUninf", number=Inf)
>head(results)
   ID        logFC   AveExpr     t        P.Value         adj.P.Val
IDO1        7.496098 8.077363  31.43580 6.144487e-213 8.017326e-209
TNFAIP6     6.945901 7.551685  29.12848 1.359307e-183 8.868116e-180
RSAD2       6.532472 8.066029  27.39471 6.405851e-163 2.786118e-159
IFI27       6.230206 7.727492  26.12712 1.461824e-148 4.768471e-145
IFITM1      6.128964 6.775055  25.70255 6.766299e-144 1.765733e-140
INHBA       5.915550 7.068516  24.80758 2.674626e-134 5.816421e-131
microarray limma R • 2.7k views
ADD COMMENTlink modified 3.7 years ago by RC0 • written 4.3 years ago by lucasecardozo10
1

The results do not match with the volcano plot, negative logFC in your results but postive in the Volcanoplot (for those with extreme p-values)

ADD REPLYlink written 4.3 years ago by WouterDeCoster44k

Sorry! I pasted the results of the reverse comparison (UninfvsInf). Thanks for noticing, I'll edit the post!

ADD REPLYlink written 4.3 years ago by lucasecardozo10

your t values should also be positive

ADD REPLYlink written 4.3 years ago by Ar990

Oh, of course! Thank you for noticing!

ADD REPLYlink written 4.3 years ago by lucasecardozo10

can you post a head of your results? also, how did you code your design matrix etc etc?

ADD REPLYlink written 4.3 years ago by TriS4.2k

Done! Added the code to the original post.

ADD REPLYlink written 4.3 years ago by lucasecardozo10

Why is the design matrix (~0 +annot[, "group"]) I think the 0 indicates that you don't want to use an intercept term, but in this design you should really keep a common base between the samples.

ADD REPLYlink written 4.3 years ago by karl.stamm3.8k

It is correct. If limma uses contrasts then it automatically inserts an extra intercept. Its in the code.

ADD REPLYlink written 4.3 years ago by Ar990

Did u fix it? I meet the same problem. could u share some suggestions, please?

ADD REPLYlink written 3.7 years ago by RC0
1

This looks like a comment rather than an answer. Please, add comments using the ADD COMMENT button under each post. There is an accepted answer to this questions that both explains the cause of the original problem and provides some solutions. Is this not working for you? If you have a different issue please ask a new question instead. Thank you.

ADD REPLYlink written 3.7 years ago by ddiez1.9k

You are right, I moved the post to a comment now.

ADD REPLYlink written 3.7 years ago by WouterDeCoster44k
7
gravatar for Ar
4.3 years ago by
Ar990
United States
Ar990 wrote:

I think one of the reasons that you are getting weird p-values is because the filtering of the dataset is not proper or the standard error across all the probes in your dataset is large. Limma calculates moderated t-statistic which uses posterior variance rather than sample variance in t-test. Or, in the layman language it is is the ratio of the M-value to its standard error similar to T-statistic. However, in this case the standard errors are moderated across all genes/ probes in your exprs. You may find more about it here. I would recommend you to use all the probes or use varFilter to filter the low expressed probes/genes.

Code:

> dim(exprs)
[1] 54675     4
> design <- model.matrix(~ group + 0)
> colnames(design) <- levels(group)
> head(design)
  uninfected infected
1        1          0
2        1          0
3        0          1
4        0          1
> cont.matrix <- makeContrasts(InfvsUninf = infected-uninfected, levels=design)
> print(cont.matrix)
            Contrasts
Levels       InfvsUninf
  infected            1
  uninfected         -1
> fit1 <- lmFit(exprs, design)
> fit2 <- contrasts.fit(fit1, cont.matrix)
> fit2 <- eBayes(fit2)
Warning message:
Zero sample variances detected, have been offset 
> results <- topTable(fit2, "InfvsUninf", number=nrow(exprs.dat))
> head(results)
                logFC  AveExpr      t      P.Value    adj.P.Val        B
210029_at   7.496098 8.077363 53.69020 5.996081e-09 0.0001206758 9.926007
202411_at   6.230206 7.727492 52.40836 6.882552e-09 0.0001206758 9.869859
201601_x_at 5.956054 7.549371 52.02982 7.173197e-09 0.0001206758 9.852669
222838_at   5.748147 7.996823 49.34098 9.709206e-09 0.0001206758 9.721920
206025_s_at 6.945901 7.551685 48.24558 1.103574e-08 0.0001206758 9.663952
201108_s_at 5.564837 7.035012 45.45633 1.549903e-08 0.0001401571 9.502414

Now, filtering the low variable probes and hence, the value of moderate t-statistic (t column) changes. P-value is dependent upon t value and therefore, it become lower than the previous one.

> library(genefilter)
> exprs = varFilter(exprs)
> dim(exprs)
[1] 27337     4
> design <- model.matrix(~ group + 0)
> colnames(design) <- levels(group)
> head(design)
  uninfected infected
1        1          0
2        1          0
3        0          1
4        0          1
> cont.matrix <- makeContrasts(InfvsUninf = infected-uninfected, levels=design)
> print(cont.matrix)
            Contrasts
Levels       InfvsUninf
  infected            1
  uninfected         -1
> fit1 <- lmFit(exprs, design)
> fit2 <- contrasts.fit(fit1, cont.matrix)
> fit2 <- eBayes(fit2)
> results <- topTable(fit2, "InfvsUninf", number=nrow(exprs.dat))
> head(results)
                logFC  AveExpr      t      P.Value    adj.P.Val         B
210029_at   7.496098 8.077363 33.85290 3.055145e-50 8.351851e-46 103.28349
206025_s_at 6.945901 7.551685 31.34093 1.140235e-47 1.558530e-43  97.62137
213797_at   6.532472 8.066029 29.35202 1.661339e-45 1.513868e-41  92.82696
202411_at   6.230206 7.727492 28.24293 3.021780e-44 2.065160e-40  90.02318
204698_at   6.131048 7.734954 27.51300 2.145762e-43 1.173174e-39  88.12390
214022_s_at 6.128964 6.775055 27.39214 2.980586e-43 1.358005e-39  87.80514

In case you are interested in top ~ 5000 genes/probes only then use varFilter(exprs, var.cutoff = 0.9). Try playing with various different values of var.cutoff to get desired number of genes.

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Ar990
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1546 users visited in the last hour