Question: qvalues are the same, thus yielding insignificant results
0
gravatar for ulapei
8 months ago by
ulapei20
ulapei20 wrote:

Hi,

I am trying to perform differential gene analysis using limma package along with qvalue package on several Affy microarrays. Basically I am comparing two conditions, let's say treated and untreated. I found that after running:

library(limma)
library(annaffy)
phenoData <- read.AnnotatedDataFrame("phenodata.txt")
data <- ReadAffy(celfile.path=".")
combinations <- factor(paste(pData(phenoData)[,1], sep = "_"))
design <- model.matrix(~combinations)
fit <- lmFit(eset, design)
fit <- eBayes(fit)
table <- topTable(fit,coef=2, 10000, adjust="fdr")
library(qvalue)
pvalues <- table$P.Value
hist(table$P.Value, nclass = 20) 
qobj <- qvalue(p = table$P.Value, lambda=0)
summary(qobj)
pi0 <- qobj$pi0 
qvalues <- qobj$qvalues 
localFDR <- lfdr(p = table$P.Value, lambda = 0)
write.qvalue(qobj,file="qvalue.txt",sep="\t")

q-values I obtained from "qpackage" and p.adjusted values from "fdr" adjustment for multiple testing result in the same values for hundreds of genes making only 2 genes significant. I strongly believe that is not the actual case, as you can see great fold change for many of the genes. In addition, many p values fall below 0.05 indicating the significance of the genes. I was wondering if anyone could explain to me why this is the output I get and how should I overcome it to produce an accurate list of differentially expressed genes. Thank you so much.

Sincerely, Ula

ADD COMMENTlink modified 8 months ago • written 8 months ago by ulapei20

You don't need to bookmark your own post. Choose "Follow via email" to be notified of any new comments/answers.

ADD REPLYlink written 8 months ago by genomax65k

Follow by email is default for all of a user's new posts.

ADD REPLYlink written 8 months ago by RamRS21k

ulapei : For future reference. How to add images to a Biostars post

ADD REPLYlink written 8 months ago by genomax65k
2
gravatar for Kevin Blighe
8 months ago by
Kevin Blighe41k
Kevin Blighe41k wrote:

Hey Ula,

Q values are adjusted P values. P values are also referred to as 'nominal' or 'un-adjusted' P values. They can be adjusted to control the false discovery rate, and various metrics exist to adjust these, including:

  • Benjamini-Hochberg
  • Bonferroni
  • Holm
  • Benjamini-Yekutieli

Thus, you do not need to use this qvalue package for anything. Limma's topTable() function reports both the P (un-adjusted) and Q (FDR) values. They will be in your object called 'table'. You instructed Limma to adjust your P values by Benjamini-Hochberg via adjust="fdr".

Keep in mind that is quite possible to have very large fold-changes and no statistical significance (and vice-versa). This can occur, for example, when comparing genes of low expression.

For a deeper discussion, read the word by the Limma developer himself: Question: conceptual question about FDR, FDR adjusted p-value and q-value.

Kevin

ADD COMMENTlink modified 8 months ago • written 8 months ago by Kevin Blighe41k

Hi Kevin,

From a bit of reading I would say that q values and p-adjusted values have a similar concept but are a bit different which makes sense because I get different values depending on the criteria I use.

For instance the output for gene x:

using 'limma', adjusted with 'fdr' yields: P.value = 6.074500e-06 p. adjusted value = 0.13698300

This yields one significant gene (out of 10k). Using p value I get 1108 genes that satisfy p<0.05. However, adjusted p value for those genes is 0,251946276. This goes for 1105/1108 genes. That's why I started not trusting p. adjusted value using limma because I know that some of those genes should be differentially expressed (I have mol. bio background). I have also tried using no adjustment and run a qvalue package on it then, but the results were still the same, so maybe it is not the adjustment after all?

using 'qvalue' package: p = 6,07E-06 q = 0.030372498 fdr = 0.052665658

This yields 3 significant genes. As you can see it definitely makes a difference as using 'qvalue' the gene would now be considered significantly expressed due to <0.05 cut off. However, for the rest of those same 1106/1108 genes I also get the same q value which is 0,251946276 which I believe is also inaccurate.

I am new to R and I would say my stats knowledge isn't that great, but I tried troubleshooting for quite a bit now and I really don't know why this discrepancy would arise.

Thanks again!

ADD REPLYlink written 8 months ago by ulapei20
1

Thank you for clarifying. Perhaps you should consider the fact that your experimental setup may not be conducive to obtaining statistically significantly differentially expressed genes (?). You could consider increasing the threshold (for your Limma-obtained Q values) to 0.1 (in place of 0.05). I have even seen people use 0.2 in the past.

ADD REPLYlink modified 8 months ago • written 8 months ago by Kevin Blighe41k
1
gravatar for andrew.j.skelton73
8 months ago by
London
andrew.j.skelton735.6k wrote:

I'm assuming here that you're trying the qvalue package as an alternative method of FDR control, as you're not seeing statistically significant genes passing your traditional BH FDR threshold. I'd recommend a few things here:

  1. Feature reduction - A lot of probes on your microarray are going to be uninformative, look at detection threshold measures in Affy packages that give you an indication that probes are signal or noise. Probes that are uninformative in this sense can be removed, thus reducing the number of tests you're performing.

  2. Data Visual 1 - Volcano Plot of your differentially expressed genes. Gives you an idea of how the data looks, and if anything looks unexpected.

  3. Data Visual 2 - P value histogram. Look at your unadjusted p values, and plot a simple histogram, this can be extremely informative as to how your hypothesis fits the data. Check out this great blog post to get an idea of what you should expect

Finally, just because some genes show an unadjusted p-value <0.05, does not mean that in the context of your experiment, ie all the features you're testing, that they're not false positives - this is the point to FDR. Check out the dead salmon poster to get a good handle on why FDR control is important.

ADD COMMENTlink written 8 months ago by andrew.j.skelton735.6k

Yes, that is correct. I have used gcRMA normalization method to account for background background correction prior using limma. Do I need to use any other package in conjunction to gcRMA or is that enough? I have read that gcRMA works the best with Affy 3' IVT microarrays. ![enter image description here][1]

V plot that I get for this particular experiment: vplot P histogram (without multiple adjusting) : Capture7

I suppose both plots look abnormal, but at this point I really don't know how to continue

P histogram looks the same regardless if it is adjusted with 'fdr' or 'none' aka no multiple testing adjustment is made

ADD REPLYlink modified 8 months ago by genomax65k • written 8 months ago by ulapei20
1

I've done it once before with MAS5 calls, but it's been a while since I've used affy chips. The below code finds probes that have absent calls in all samples (although you could modify this to 90% of samples for example). It removes those probes, and control probes that you don't want to test. Give that a shot, and then afterwards if there are no statistically significant results, then consider looking at covariates.

data.norm           <- rma(data.raw)
data.mas5.calls     <- mas5calls(data.raw) %>% 
                       exprs %>%
                       {. == 'A'} %>% 
                       rowSums %>% 
                       .[. == ncol(data.norm)] %>% 
                       names
data.norm.filtered  <- data.norm %>% 
                       .[!grepl("^AFFX", rownames(.)),] %>% 
                       .[!(rownames(.) %in% data.mas5.calls),]
ADD REPLYlink written 8 months ago by andrew.j.skelton735.6k

Hi! I have tried it with MAS5. Now the plots I get are depicted below.

https://ibb.co/gXdKJo

https://ibb.co/eiLAPT

The plots seem much better, however, I still obtain that only one gene is differentially expressed.

ADD REPLYlink modified 8 months ago by andrew.j.skelton735.6k • written 8 months ago by ulapei20
1

Your p-value distribution is close to random uniform, and your volcano plot is using unadjusted P-values, so I'd suggest that everything looks as expected. You'll need to do further investigation with tools such as PCA to get an experiment wide view of the data, or look at covariates that may be muddying the waters on your effect of interest.

ADD REPLYlink written 8 months ago by andrew.j.skelton735.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 974 users visited in the last hour