Question

Random dataset and DESeq2

0

Entering edit mode

10 months ago

Saleh • 0

Hi everyone,

I have real gene counts data which I reshuffled (the counts for each gene in a column were shuffled) and I ran DESeq2 to check If I get any signifcant genes and I got more than hundred significant genes. Isn't this result surprising? why am I getting so many significant genes? Is there something wrong with my approach here?

DESeq2 • 601 views

ADD COMMENT • link updated 10 months ago by LauferVA 4.2k • written 10 months ago by Saleh • 0

0

Entering edit mode

I don't think you've given enough detail about your approach (what's the data? are you correcting for covariates? how did you shuffle? how are you defining significant? how are you estimating library sizes?) to enable anyone to comment on whether it's right or wrong.

ADD REPLY • link 10 months ago by LChart 3.9k

score 1 · Answer 1 · 2023-06-03

Saleh

Not only is this not surprising, it is the basis for an entire class of statistical techniques referred to as permutation based testing, which can be used to derive accurate test statistics when one is worried about insufficient type I error control for some reason or another - I think that is the place to start reading (try https://en.wikipedia.org/wiki/Permutation_test to start, move to literature from there).

If these test statistics did not generate significant results, they would have no ability to be used to empirically derive alpha (threshold for Type I Error).

You havent told us how many gene annotations you are using, but nearly all the gene sets used are 18000 - 50000 or so. 100 DEGs is therefore between 1 in 180 and 1 in 500, which not at all unreasonable even after controlling for multiple testing using False discovery rate, or FDR.