I'm working on a case/control study and including some other variables such as genotypes. One thing I want to do is a subgroup analysis of the genotypes for just the case group. The brute force method is to subset my data to the case group and run deseq on that. I'm very new to deseq and the very idea of contrasts. But from reading it seems like contrasts offer a more elegant way to do subgroup analysis without having to rerun deseq multiple times on differently subsetted data.
Suppose I have the following variables:
- Case: [Case, Control]
- GeneA: [YY, YN, NN]
- GeneB: [YY, YN, NN]
I run deseq with the design "~ Case + GeneA + GeneB + GeneA:GeneB"
How would I write contrasts that would be the equivalent of
- Subgroup just those with the disease (Case="Case")
- Subgroup just homozygous genotypes: (GeneA = [YY, NN], GeneB=[YY,NN])
- Ask the question: Is the strongest effect due to GeneA, GeneB, or the interaction?
So, for instance, given my design, to subgroup
Case=="Case", my contrast would be
Here is a putative study design to work with:
> fake_study_design = data.frame( + Case=sample(c('Case', 'Control'), 10, replace=T), + GeneA=sample(c('YY','YN','NN'), 10, replace=T), + GeneB=sample(c('YY', 'YN', 'NN'), 10, replace=T) + ) > fake_study_design Case GeneA GeneB 1 Case YY YY 2 Case YN NN 3 Control YY NN 4 Control YY YN 5 Control YN YY 6 Control NN NN 7 Control NN YY 8 Case NN YN 9 Control YY NN 10 Control YY YY