Question: Histogram plot of padj values
gravatar for cado
9 months ago by
cado0 wrote:


I'm carrying out DE analysis on RNASeq data in DESeq2. I have a question about the histogram plots I used to plot the p-adj values of my results. I'm referring to this post; to best explain my histogram plots!

I have two sets of results; X and Y. Result X was calculated first with no log fold threshold and padj < 0.05. When I plotted the padj values of Result X on a histogram I got an Anti-conservative histogram (similar to A in the website I'm referring to) with a large number of padj values < 0.05.

Result X resulted in a large number of DE genes so I decided to switch to Result Y where a log fold threshold of 1 was applied and then an FDR <0.05. I now have a set of genes that is more managable to do DE analysis on, however my concern with Result Y is that when I plotted the histogram of the padj values it resulted in a conservative plot (similar to D in the website) with a larger number of padj values = 1 than those < 0.05.

My question is; is histogram Y a concern? Going by the post attached is there something wrong with my dataset or as he mentioned is applying a log fold threshold a correction method? As long as histogram X is ok then my data is good to do statistical analysis on? Also I'm trying to understand what's happening the padj values in Result Y when the log fold threshold is applied? Is it the genes excluded from the data because of the threshold and they have no padj value assigned?


I get the same histogram when I plot the p-values of both results.

I'm count data is stored in the "transcript" dataframe under 2 conditions(pre and post) and 3 replicates per group. Design matrix is stored in the "coldata" dataframe.

Here's the code:

[1] 4874    6

T1_pre        pre
T1_post      post
T2_pre        pre
T2_post      post
T3_pre        pre
T3_post      post

dds <- DESeqDataSetFromMatrix(countData = transcripts,
                              colData = coldata, 
                              design = ~ condition)

dds <- DESeq(dds)    

#Result X
res <- results(dds)

> table(res$padj < 0.05)

 1095  3773 

#Result Y
resLFC <- results(dds, lfcThreshold = 1)

> table(resLFC$padj < 0.05)

 3991   877 

hist( res$pvalue, breaks=20, col="grey" )
hist( resLFC$pvalue, breaks=20, col="grey" )

Histogram of results no log fold threshold:

Histogram of results log fold threshold < 1:

statistics histograms rna-seq • 373 views
ADD COMMENTlink modified 9 months ago by RamRS27k • written 9 months ago by cado0

You should plot the nominal p-values histogram ; not the adjusted ones. Also could you describe the different commands you used to build your results i.e. the DESeq2 commands and also define the number of samples, number of groups, experimental design, etc...

ADD REPLYlink modified 9 months ago • written 9 months ago by Nicolas Rosewick8.8k

To quote the linked page:

Make a histogram of your p-values. Do this before you perform multiple hypothesis test correction, false discovery rate control, or any other means of interpreting your many p-values.

ADD REPLYlink written 9 months ago by Jean-Karim Heriche22k

Please do not add answers unless you're answering the top level question. You can edit your post and add these details. Your images are also added improperly. I'll clean up your post, but please put some more effort in the future.

ADD REPLYlink modified 9 months ago • written 9 months ago by RamRS27k

How to add images to a Biostars post

ADD REPLYlink written 9 months ago by WouterDeCoster43k

I was checking the website that you provided. This might be helpful: "Don’t apply false discovery rate control to these p-values yet. (Why not? Because some kinds of FDR control are based on the assumption that your p-values near 1 are uniform. If you break this assumption, you’ll get way fewer significant hypotheses. Everyone loses)."

ADD REPLYlink written 9 months ago by Mahdiar Sadeghi0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 814 users visited in the last hour