Question: DESeq2: can I correct for relatedness when using data from multiplex families?
0
gravatar for alesssia
4.2 years ago by
alesssia510
London, UK
alesssia510 wrote:

Hi All.

I'm trying to perform a differential expression analysis using DESeq2. The dataset under study is composed by families (mostly sibling) and I wonder whether DESeq2 is suitable to analyse these data.

Can one provide relatedness information as an additional factor? If yes, how? I've noticed that DESeq2 does not support mixed model, and, thus, the family information can not be submitted as random effect (as I use to do with lme). Is it possible to estimate a within-family correlation from the data and use it for the differential analysis ( as done by limma)?

The best B-plan I have is to use only one individual for each family, but this will reduce my dataset of about 1/3. 

Does anyone has any suggestions? 

Thanks in advance for your help,

Alessia

 

ADD COMMENTlink modified 4.2 years ago by Biostar ♦♦ 20 • written 4.2 years ago by alesssia510
3

Can't you assign (say) a new class for every family and put this into the design matrix alongside all the other factors?

ADD REPLYlink written 4.2 years ago by Phil S.660
2

Yup, that's how this is normally done. It's not exactly the same as using a mixed-effects model, but given that the point of using a random effect is dispersion shrinkage, which DESeq2/edgeR/etc. are doing anyway, it's debatable how much is actually gained with a mixed-effects model given the added annoyance of dealing with them.

ADD REPLYlink written 4.2 years ago by Devon Ryan89k

I see. Ok, thanks both! Does it create any problem having only one sample for some families?

ADD REPLYlink written 4.2 years ago by alesssia510
1

Yes, this would be problematic since it would result in a rank-deficient design. You might look and see if you can directly input a model matrix. You could then create one using the normal model.matrix(~...) commands and then just remove columns for cases where you only have one sample per family.

ADD REPLYlink written 4.2 years ago by Devon Ryan89k

My dataset is composed by roughly  ~100 families with one member and ~200 families with two members. If I remove the families with only one sample I will lose ~20% of my samples. It is still better than to use on sample per each family (in that case I will lose ~40% of my sample and I will have extra problems in choosing the member to remove), but it is not the ideal situation. Is there any way to use the full dataset?

ADD REPLYlink written 4.2 years ago by alesssia510

If you have a sample offset in addition to the familyID, then just remove one of those columns from the model matrix for those 100 samples. Then you won't lose and samples. If you're not including a subject offset as well (I don't know your complete design), then just having a single sample in a family won't be an issue.
 

ADD REPLYlink written 4.2 years ago by Devon Ryan89k

I don't have a sample offset, so it seems my design is fine and my dataset is saved! Thanks!

 

 

ADD REPLYlink written 4.2 years ago by alesssia510

Do you thus suggest to use the following model?

​~  Covariate + Technical.Confounder + condition + familyID

In the contest of linear models this corresponds to use an extra fixed effect, and I wonder whether this makes sense in this context. My objective is to find genes that are differentially expressed between conditions, taking into account other (fixed) effects that may be responsible of the gene expression, such as covariates or technical confounders. I would like to consider the family membership only as a basis for making inference about the sampled population, not as interesting in itself --but I can be wrong in this interpretation!

What do you think?

ADD REPLYlink written 4.2 years ago by alesssia510
1

yep, that is what I was suggesting. However, if I recall correctly DESeq2 always tests for the lest factor given in the design Matrix. Thus, in your example it would test for family differences. Devon please correct me here if I am wrong. Haven't looked into DESeq2 for a while now (shame on me... :D ).
 

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by Phil S.660

Well, it'll default to testing and plotting according to the last factor. For the results, you can just specify what you want easily enough (I don't recall there being a convenient way to always do that for the plots).

But in general you're correct that things are a bit simpler if one puts a variable of interest last.

ADD REPLYlink written 4.2 years ago by Devon Ryan89k

Yeah, it was my mistake in writing the formula!

Thanks both for your help!

ADD REPLYlink written 4.2 years ago by alesssia510

Hi,

In your solution, the model takes care only if the samples belong to the same family. I would like to know how we can add the level of relatedness such as MZ twin and DZ twin. Currently, I have a variable zygosity with 2 levels (MZ and DZ).

Do you thus suggest to use the following model?

​~  Covariate + Technical.Confounder + familyID + zygosity + condition

and how we do the same thing if we are also also members of family.

Regards,

Tiphaine

ADD REPLYlink written 4.2 years ago by tiphaine10

Please post this as a new question.

ADD REPLYlink written 4.2 years ago by Devon Ryan89k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 807 users visited in the last hour