Question: Help understanding the meaning of variables in DESeq design matrix
1
gravatar for Kristin Muench
11 months ago by
United States
Kristin Muench420 wrote:

Hello,

I have a dataset with a variety of samples that vary in Age (Young/Old) and Sex (M/F).

I'm interested in testing a few hypotheses, including (Q1) "What genes are DE as a product of Sex?" and (Q2) "What genes are DE as a result of the interaction of Age and Sex?"

To answer Q1, I originally imported data like so:

# Attempt 1
myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable_forAllMySamples,
                                     directory = pathToHTSeq,
                                     design = ~Sex)
dds <- DESeq(myData)

This produced a very large DE gene list.

Later, I redid this analysis with a different design matrix including interaction and contrasts, like so:

# Attempt 2
myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable_forAllMySamples,
                                     directory = pathToHTSeq,
                                     design = ~Sex+Age+Sex:Age)
dds <- DESeq(myData )
res <- results(dds,  contrasts=c('Sex', 'M', 'F'))

However, this produced a MUCH smaller list of DE genes.

My understanding is that pulling out the contrasts should look for the main effect of Sex in my dataset (so, Sex effects regardless of Timepoint).

I had expected that would be the same as if I just made design matrix ~Sex, but it looks like that isn't the case. Why is that?

It that because Attempt 2's design matrix "controls" for Age and any interaction effects, but Attempt 1 does not? Can anyone help me understand a bit better what is being tested in Attempt 1, or point me towards resources to strengthen my understanding of what that was doing?

Possibly relevant: When I PCA plotted my rlog-normalized data, the data clustered very well by Sex, and less well by Age.

Thank you very much for your help!

rna-seq deseq2 R • 473 views
ADD COMMENTlink modified 11 months ago by Devon Ryan91k • written 11 months ago by Kristin Muench420
2
gravatar for Devon Ryan
11 months ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

Things like slightly imbalanced group sizes (in this case, the numbers of males and females at each age) as well as the difference power increasing with sample size are the prime causes for this. I should note that the results with just ~sex as the design likely have more false-positives, since they're not accounting for the confounder of age (as you astutely surmised).

It's pretty common for samples to cluster strongly by sex, its effect isn't as variable as something like age.

ADD COMMENTlink modified 11 months ago • written 11 months ago by Devon Ryan91k

Thank you! This is helpful.

ADD REPLYlink written 11 months ago by Kristin Muench420
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 543 users visited in the last hour