I am trying to calculate the FDR for the p.values in my bulk-RNAseq dataset (unfiltered, so on 21011 genes total, the dataset is composed by 2 conditions and it has 3 samples/condition) but I am not particularly strong in stats so I am confused in the output (so apologies if the question is "stupid"). I did used
qvalue package and here is the code:
library(qvalue) qobj <- qvalue(p = test$p_value) summary(qobj) #summary showing no significant genes hist(qobj)
obtaining the following results (from
Call: qvalue(p = test$p_value) pi0: 0.4549629 Cumulative number of significant calls: <1e-04 <0.001 <0.01 <0.025 <0.05 <0.1 p-value 4 88 837 1858 3284 5292 q-value 0 0 0 0 0 38 local FDR 0 0 0 0 0 0 <1 p-value 21011 q-value 21011 local FDR 20988
and this is the histogram
so I have 2 questions:
Is it possible that I do not get any significant value (not even 1) ? (I did also try
p.adjust(test$p_value,method="BH")obtaining again 0 significant FDR)
Is it wrong to calculate the FDR only on a smaller group of genes?
Basically looking here blog it seems that my pvalues are Bimodal (scenario C) and he suggests that
Don’t apply false discovery rate control to these p-values yet.
and one of the options he suggests is to
Do all the p-values close to 1 belong to some pathological case? An example from my own field: RNA-Seq data, which consists of read counts per gene in each a variety of conditions, will sometimes include genes for which there are no reads in any condition. Some differential expression software will report a p-value of 1 for these genes. If you can find problematic cases like these, just filter them out beforehand (it’s not like you’re losing any information!)
And in my dataset unfiltered I have genes, as he mentioned, there are genes with no reads (better, it has been assigned 0.1). Do you, in summary, recommend leaving it or subset the genes or..?
thank you in advance for all the suggestions