Suppose I have two cell conditions, and the processed RNA-seq data for both of them, giving expression log2 fold changes.
I want to test a hypothesis that a subgroup of genes is upregulated statistically stronger than all the genes. So, I am creating a column with log2 fold changes of all genes whose changes are statistically significant (say, 10k genes), and a second column with log2 fold changes of the genes belonging to my subgroup of interest, whose changes are statistically significant (say, 1k genes). Then I calculate two-sample t test for these two groups of genes. And I get a P value which is quite low.
Please comment, whether I am doing it right?
PS. Someone suggested that I should be using Mann-Whitney Test instead of the two-sample t test. Could statisticians please comment on this?