Question: Help understanding the meaning of variables in DESeq design matrix
1
gravatar for Kristin Muench
7 months ago by
United States
Kristin Muench380 wrote:

Hello,

I have a dataset with a variety of samples that vary in Age (Young/Old) and Sex (M/F).

I'm interested in testing a few hypotheses, including (Q1) "What genes are DE as a product of Sex?" and (Q2) "What genes are DE as a result of the interaction of Age and Sex?"

To answer Q1, I originally imported data like so:

# Attempt 1
myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable_forAllMySamples,
                                     directory = pathToHTSeq,
                                     design = ~Sex)
dds <- DESeq(myData)

This produced a very large DE gene list.

Later, I redid this analysis with a different design matrix including interaction and contrasts, like so:

# Attempt 2
myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable_forAllMySamples,
                                     directory = pathToHTSeq,
                                     design = ~Sex+Age+Sex:Age)
dds <- DESeq(myData )
res <- results(dds,  contrasts=c('Sex', 'M', 'F'))

However, this produced a MUCH smaller list of DE genes.

My understanding is that pulling out the contrasts should look for the main effect of Sex in my dataset (so, Sex effects regardless of Timepoint).

I had expected that would be the same as if I just made design matrix ~Sex, but it looks like that isn't the case. Why is that?

It that because Attempt 2's design matrix "controls" for Age and any interaction effects, but Attempt 1 does not? Can anyone help me understand a bit better what is being tested in Attempt 1, or point me towards resources to strengthen my understanding of what that was doing?

Possibly relevant: When I PCA plotted my rlog-normalized data, the data clustered very well by Sex, and less well by Age.

Thank you very much for your help!

rna-seq deseq2 R • 337 views
ADD COMMENTlink modified 7 months ago by Devon Ryan88k • written 7 months ago by Kristin Muench380
2
gravatar for Devon Ryan
7 months ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

Things like slightly imbalanced group sizes (in this case, the numbers of males and females at each age) as well as the difference power increasing with sample size are the prime causes for this. I should note that the results with just ~sex as the design likely have more false-positives, since they're not accounting for the confounder of age (as you astutely surmised).

It's pretty common for samples to cluster strongly by sex, its effect isn't as variable as something like age.

ADD COMMENTlink modified 7 months ago • written 7 months ago by Devon Ryan88k

Thank you! This is helpful.

ADD REPLYlink written 7 months ago by Kristin Muench380
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2199 users visited in the last hour