I would like to ask your assistance with the following problem: I have a subgroup of genes which contain a certain motive and I would like to know if the presence of this motive significantly changes the expression of said genes. I have RNA-Seq data for a control (WT), and two over-expression and mutant lines.
So far I have came up with one approach, which I will outline below, but I'm uncertain if this is the correct approach and I would like to know if there are any other methods of finding the significance of my subgroup.
My planned approach is as follows:
- Obtain the significantly differentially expressed genes with edgeR, by comparing WT with the over-expression and mutant conditions.
- Divide the genes into three categories, based on the edgeR output. The categories are either +1, if a gene is significantly differentially expressed AND up-regulated, -1 if significantly differentially expressed AND down-regulated and 0 if not significant.
- Perform Chi-square analysis based on the categorized data, comparing the frequencies/percentages of the subgroup with the frequencies/percentages of all the genes (including those of the subgroup).
- Do bootstrapping analysis with replacement and get the one-sided p-value.
And that's about it. I don't have much experience in this kind of analysis and my statistics are not that strong, so please correct me if I made any mistakes or if you know of a better way of testing!
Thanks in advance for everyone taking the time to read this and to anyone who is willing/capable of helping me with my problem.