N.B., you'll need abs(log2 fold change) for Mann-Whitney to make sense.
With a subset of that size the question becomes how you want to think about the comparison. Given your background question, I suspect that GSEA is answering the question you want to ask (i.e., "given two groups, is the subset 'perturbed' in one group versus the other in respect to the other genes").
Hmm, if I take abs(log2 fold change), then how do I distinguish between up- and downregulated? This will be a problem...
But what if I just use the two sample t-test, would that be problematic?
How big are the subsets and full sets? You likely want GSEA instead...
The full set is 7,000 genes, and the subsets are like 1,000, 2,000 genes
N.B., you'll need
abs(log2 fold change)
for Mann-Whitney to make sense.With a subset of that size the question becomes how you want to think about the comparison. Given your background question, I suspect that GSEA is answering the question you want to ask (i.e., "given two groups, is the subset 'perturbed' in one group versus the other in respect to the other genes").
Hmm, if I take abs(log2 fold change), then how do I distinguish between up- and downregulated? This will be a problem... But what if I just use the two sample t-test, would that be problematic?
You're comparing a subset versus the whole. The direction of change of individual genes is completely irrelevant to that question.
I actually don't understand the argument about "abs". Why negative values are a problem for Mann-Whitney?
I just need one quick calculation, would not want install GSEA for that. Do you know, which statistical test would be used in GSEA in this case?
It's largely a more complicated version of
ks.test()
in R.