Question: using fdr to gate values in NGS comparison ttest
0
6.2 years ago by
United States
BarristanTheBold0 wrote:

New to NGS analysis, but that's the task I've been assigned. I have received NGS data that I am trying to decipher.

I’m attempting to learn what exactly is meant by "unadjusted p-value" and "FDR" in looking at comparison ttests of genes (the comparisons are between NGS of animals treated with drug or placebo). I understand the basic concepts, but not how to functionally make use of them. Most of the values seem fairly large (well over 0.1 for p-values, in the 0.1 to 0.9 range for FDR) when looking at data sets of ~20,000 to 40,000 members. My goal here is to determine a value for each that would allow me to gate on the genes with meaningful expression differences. Is there a specific value I should use as the boundary, or some way to calculate it based on the sample size or something?

ngs unadjusted p-value fdr • 2.2k views
modified 6.2 years ago by Devon Ryan96k • written 6.2 years ago by BarristanTheBold0
2
6.2 years ago by
Devon Ryan96k
Freiburg, Germany
Devon Ryan96k wrote:

Ignore unadjusted p-values completely. Unadjusted p-values, also called "raw p-values" or simply p-values, don't have much relevance in individually when you perform multiple testing (see this XKCD comic for a nice example of why multiple-testing and fishing for changes increases false-positive rates). A common threshold for adjusted p-values (or FDR) is 0.1 (as with p-value thresholds in general, there's some wiggle room here). That's a bit higher than the typical 0.05 that you'd use with a raw p-value, but it turns out to be a convenient trade-off. After making a list of significant findings, sort them by fold-change to help prioritize results.

This. I see so many people making the mistake of assuming a low p-value is a large effect size.

Isn't the way to combat that to just lower your threshold for calling something significant?

2

Never confuse statistical significance and biological relevance.

1

No. P-value is a measure of significance, and therefore more related to variation and sample consistency. If all the drug treated were at 102.1% expression plus or minus 0.001, this would have high certainty of difference without much biological relevance; compared to another gene with 300% plus or minus 50. As Devon said, use fdr to gate then sort for high fold change. They will be correlated..