Dear statisticians:
Suppose I have two cell conditions, and the processed RNA-seq data for both of them, giving expression log2 fold changes.
I want to test a hypothesis that a subgroup of genes is upregulated statistically stronger than all the genes. So, I am creating a column with log2 fold changes of all genes whose changes are statistically significant (say, 10k genes), and a second column with log2 fold changes of the genes belonging to my subgroup of interest, whose changes are statistically significant (say, 1k genes). Then I calculate two-sample t test for these two groups of genes. And I get a P value which is quite low.
Please comment, whether I am doing it right?
Thanks
PS. Someone suggested that I should be using Mann-Whitney Test instead of the two-sample t test. Could statisticians please comment on this?
Did you use the same background to find significant genes in both sets, are the two sets independent?
yes, the significance was determined once for all the genes using the standard workflow, I am not changing it when splitting genes into subsets
I'm not sure in this case if it is correct to use the t-test (see Independent t-test using SPSS Statistics). If you are interested in addressing the strength of a statistical signal (the fold changes) you could use volcano plot.
hth
I was not using independent t-test, I was using two sample t-test