Question: DESeq2 multi-factor design
gravatar for swbarnes2
3.0 years ago by
United States
swbarnes28.6k wrote:

Sorry for being the millionth person to ask questions about this topic, but I haven't been able to find a clear answer to my questions.

So I'm using DESeq2 to find DE genes between two tissues. I have 20 samples of tissue 1, and 20 of tissue 2, and I use a design of ~ Tissue, and that gives me results.

But these aren't just 20 random tissue 1 and 20 random tissue 2 samples. There are relationships and structure between the samples.

My samples were taken at 4 different time points, and I want time point differences to be accounted for when calculating if the genes are differential expressed. So I do a design of Time + Tissue, and that gives me much the same genes, but with much better p-values. That's how I should set it up, if my goal is just to look at tissue differences, correcting for time point effects? (I feel like Time:Tissue is not what I want, because I'm not interested in zooming into the differences in any one particular timepoint)

My 40 samples are also paired, There is a tissue 1 and a tissue 2 from each of 20 individuals. So I could do a design of Individual + Tissue to get DE genes between tissues, and this would account for differences between individuals? (separate question: Is this a good idea with 20 individuals for 40 samples?)

So I'd really like DE genes between tissues, taking into account variation introduced by time point and individual, but Time + Individual + Tissue won't work, I presume because each individual is present at only a single time point. I can make a new column, with time and sample concatenated together, and do concat + Tissue, but now I've lost my replicates, and I'm not sure that splitting this up into 20 distinct groups is what I really want.

I can cheat a bit and make new individual numbers, that are just 1-5, so now New + Time + Tissue is full rank, because it thinks the same individuals are present in all 4 time point groups. Is this the right answer...or just a pretty good one?

What I want is for the software to understand that each sample is part of three separate groups, and for it to remove the influence from the timepoint and individual group to make the influence of the tissue group sharper. Is there an approach I am missing?

rna-seq deseq2 • 1.8k views
ADD COMMENTlink modified 23 months ago by Biostar ♦♦ 20 • written 3.0 years ago by swbarnes28.6k

You don't have three separate groups, only 2. If you account for Individual then you've already accounted for Time. Honestly, just do that and be done with it.

Telling the software that people 1-5 are present at all time points (so you can have ~Individual + Time + Tissue) isn't going to gain you anything. Assuming these are human samples you're likely to have fairly high variance between individuals, so the Individual variance is going to be higher in this setup and I would worry that that'd make the variance for the Tissue coefficient similarly inflated...thus tanking your power. I mean, it's not like this stuff takes terribly long to run, so you can check, but that'd be my expectation.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Devon Ryan96k

Yeah, I guess you are right. Using Individual + Tissue was giving me the highest p-values,so I guess that's a sign that it was filtering away the most extraneous influences. Thanks.

ADD REPLYlink written 3.0 years ago by swbarnes28.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1651 users visited in the last hour