I have a data-set of tumors and I am checking to see if smoking causes a transcriptional difference. We have divided our sample into ever and never smokers.
This same set of tumors has 3 transcriptional subtypes: A, B, and C. I want to control for the differential amounts of these tumors within the ever and never smoker groups.
In my clinical data sheet I put into DESeq2 I have 2 columns: Smoking - Y or N Type - A, B, or C
I encoded this in DESeq2 as:
dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design= ~ Type + Smoking)
However, I have been asked to do DESeq2 within each subtype and compare the results across the 3. So, do DESeq2 in only Type A then repeat for Type B only and Type C only. Afterwards, I see which genes are present in all 3 groups.
The results are more interesting the first way as opposed to the latter.
My question is, which of these methods is mathematically more accurate?