Question: NO FDR <0.05 in the whole dataset
gravatar for camillab.
12 days ago by
camillab.40 wrote:

Hi !

I am trying to calculate the FDR for the p.values in my bulk-RNAseq dataset (unfiltered, so on 21011 genes total, the dataset is composed by 2 conditions and it has 3 samples/condition) but I am not particularly strong in stats so I am confused in the output (so apologies if the question is "stupid"). I did used qvalue package and here is the code:

qobj <- qvalue(p = test$p_value)
summary(qobj) #summary showing no significant genes

obtaining the following results (from summary(qobj) ):

qvalue(p = test$p_value)

pi0:    0.4549629   

Cumulative number of significant calls:

          <1e-04 <0.001 <0.01 <0.025 <0.05 <0.1
p-value        4     88   837   1858  3284 5292
q-value        0      0     0      0     0   38
local FDR      0      0     0      0     0    0
p-value   21011
q-value   21011
local FDR 20988

and this is the histogram histogram

so I have 2 questions:

  1. Is it possible that I do not get any significant value (not even 1) ? (I did also try p.adjust(test$p_value,method="BH") obtaining again 0 significant FDR)

  2. Is it wrong to calculate the FDR only on a smaller group of genes?

Basically looking here blog it seems that my pvalues are Bimodal (scenario C) and he suggests that

Don’t apply false discovery rate control to these p-values yet.

and one of the options he suggests is to

Do all the p-values close to 1 belong to some pathological case? An example from my own field: RNA-Seq data, which consists of read counts per gene in each a variety of conditions, will sometimes include genes for which there are no reads in any condition. Some differential expression software will report a p-value of 1 for these genes. If you can find problematic cases like these, just filter them out beforehand (it’s not like you’re losing any information!)

And in my dataset unfiltered I have genes, as he mentioned, there are genes with no reads (better, it has been assigned 0.1). Do you, in summary, recommend leaving it or subset the genes or..?

thank you in advance for all the suggestions


rna-seq qvalue R fdr • 96 views
ADD COMMENTlink modified 12 days ago by rpolicastro4.0k • written 12 days ago by camillab.40

in my bulk-RNAseq dataset (unfiltered, so on 21011 genes total, the dataset is composed by 2 conditions and it has 3 samples/condition)

I think your description is leaving an obvious gap in the details: how was the statistical test conducted? Please specify which approach was used, e.g. DESeq2, edgeR, limma, or something else? If your result is not from one of these, you should first run a state-of-the-art aproach and check again. For example, if you used a simple t.test this would be the wrong approach. The named packages already provide FDR corrected qvalues, thus, given a standard approach is taken, there is no need to adjust p.values on your own. Please investigate that first and provide more background on the analysis.

ADD REPLYlink modified 12 days ago • written 12 days ago by Michael Dondrup48k
gravatar for ATpoint
12 days ago by
ATpoint46k wrote:

Same problem as here: A: p.adjusted p value Experiment is underpowered and you most likely use inappropriate statistics (=generating the pvalues) for RNA-seq. Not more to say, with the given analysis strategy and power you have no DEGs after correction, no matter how you do it, the code for BH is correct though.

ADD COMMENTlink written 12 days ago by ATpoint46k

ok! thank you! Then I will leave it

ADD REPLYlink written 12 days ago by camillab.40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1339 users visited in the last hour