Question

EdgeR analysis p-value plateau formation

0

Entering edit mode

7.1 years ago

Laura • 0

Dear all,

I have performed various sub-phenotype differential expression analysis using EdgeR on RNAseq data. Overall, we had a sample size of 127 samples, the comparison containing the smallest sample sizes contains n= 24 vs n=28 samples.

There is a paper by Ching et al (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4201821/) suggesting that from n=25 in each group EdgeR should exhibit a power of 0.8.

But when looking at the p-values generated by the analysis there is a "plateau formation" see figure below https://postimg.org/image/4suy3i5c7/

Analysis on the same data set just other sub-phenotypes (slightly bigger sample size) works well, see below: https://postimg.org/image/6x62j5645/

Following discussion with the bioinformatics team, the one explanation they could give me is that samples size was the issue but reading up on it more and more, I do not believe this should cause such a clear effect. Although I cannot seem to get to the bottom of what would cause this either.

Has anyone seen this before or know of a different possibility why this might occure? Any help is much appreciated!

Laura

p.s. See p-value distribution plots odd one: https://postimg.org/image/nb4zqi6cb/

good one: https://postimg.org/image/spljz0cct/

EdgeR p-values Power samples size • 2.5k views

ADD COMMENT • link 7.1 years ago by Laura • 0

0

Entering edit mode

I guess it's due to the multi-testing adjustment. which adjustment method is used by edgeR ?

ADD REPLY • link 7.1 years ago by Nicolas Rosewick 11k

0

Entering edit mode

It uses the Benjamini-Hochberg method. The multiple-testing adjustment method is the same for the 2nd graph though and that worked fine. I performed 3 other analysis using the same script, it is just this one subset that seems to come back odd. Really puzzled by it.

If you manually look at the p-values you see that the all the transcripts have very similar p-values e.g. its not just the q-values that 'level out/plateau'.

ADD REPLY • link 7.1 years ago by Laura • 0

1

Entering edit mode

could you plot the p-value (before multi-testing correction) distribution for the two analysis.

ADD REPLY • link 7.1 years ago by Nicolas Rosewick 11k

0

Entering edit mode

yes, absolutely. Thank you for your help. I will do so tonight and add the plots.

ADD REPLY • link 7.1 years ago by Laura • 0

0

Entering edit mode

I have added pictures of the plot the at bottom of the original message (hope this was what you meant anyway).

ADD REPLY • link 7.1 years ago by Laura • 0

1

Entering edit mode

Sounds like the normalization step might have an error if you have similar p&q values? This is why it forms a plateau. Did you do voom, limma-trend or other type of normalization?

ADD REPLY • link 7.1 years ago by theobroma22 ★ 1.2k

0

Entering edit mode

hi, sorry just to clarify the p-q values aren't similar in value they just show a similar trend e.g. a big subset of transcripts had almost identical values. I used TMM to normalize. How could I check if something went wrong there?

Also, I wrote one script and ran all the sub-phenotype analysis against it (with the obvious minor adjustments) so would it make sense that only in one comparison the normalization went wrong but in none of the others? I don't know :-)

ADD REPLY • link 7.1 years ago by Laura • 0

1

Entering edit mode

In practice, it is a good exercise to evaluate the other available normalization methods since for RNA-seq the library size is a critical piece of the normalization process. For example, voom is used if the library sizes differ by more than 3%. Once you've done this you can choose the best normalization method. To choose the best method you can compare the before and after normalization plots.

ADD REPLY • link 7.1 years ago by theobroma22 ★ 1.2k

0

Entering edit mode

Thank you! I will try other normalization methods and see if that 'fixes' the problem! Much appreciate your help!

ADD REPLY • link 7.1 years ago by Laura • 0