Question

DESeq2: can I correct for relatedness when using data from multiplex families?

0

Entering edit mode

10.5 years ago

alesssia ▴ 580

Hi All

I'm trying to perform a differential expression analysis using DESeq2. The dataset under study is composed by families (mostly sibling) and I wonder whether DESeq2 is suitable to analyse these data.

Can one provide relatedness information as an additional factor? If yes, how? I've noticed that DESeq2 does not support mixed model, and, thus, the family information can not be submitted as random effect (as I use to do with lme). Is it possible to estimate a within-family correlation from the data and use it for the differential analysis (as done by limma)?

The best B-plan I have is to use only one individual for each family, but this will reduce my dataset of about 1/3.

Does anyone has any suggestions?

Thanks in advance for your help,

Alessia

DESeq2 RNA-Seq Multiplex-families • 4.9k views

ADD COMMENT • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by alesssia ▴ 580

3

Entering edit mode

Can't you assign (say) a new class for every family and put this into the design matrix alongside all the other factors?

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by Phil S. ▴ 700

2

Entering edit mode

Yup, that's how this is normally done. It's not exactly the same as using a mixed-effects model, but given that the point of using a random effect is dispersion shrinkage, which DESeq2/edgeR/etc. are doing anyway, it's debatable how much is actually gained with a mixed-effects model given the added annoyance of dealing with them.

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by Devon Ryan 105k

0

Entering edit mode

I see. Ok, thanks both! Does it create any problem having only one sample for some families?

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by alesssia ▴ 580

1

Entering edit mode

Yes, this would be problematic since it would result in a rank-deficient design. You might look and see if you can directly input a model matrix. You could then create one using the normal model.matrix(~...) commands and then just remove columns for cases where you only have one sample per family.

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by Devon Ryan 105k

0

Entering edit mode

My dataset is composed by roughly ~100 families with one member and ~200 families with two members. If I remove the families with only one sample I will lose ~20% of my samples. It is still better than to use on sample per each family (in that case I will lose ~40% of my sample and I will have extra problems in choosing the member to remove), but it is not the ideal situation. Is there any way to use the full dataset?

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by alesssia ▴ 580

0

Entering edit mode

If you have a sample offset in addition to the familyID, then just remove one of those columns from the model matrix for those 100 samples. Then you won't lose and samples. If you're not including a subject offset as well (I don't know your complete design), then just having a single sample in a family won't be an issue.

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by Devon Ryan 105k

0

Entering edit mode

I don't have a sample offset, so it seems my design is fine and my dataset is saved! Thanks!

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by alesssia ▴ 580

0

Entering edit mode

Do you thus suggest to use the following model?

~  Covariate + Technical.Confounder + condition + familyID

In the contest of linear models this corresponds to use an extra fixed effect, and I wonder whether this makes sense in this context. My objective is to find genes that are differentially expressed between conditions, taking into account other (fixed) effects that may be responsible of the gene expression, such as covariates or technical confounders. I would like to consider the family membership only as a basis for making inference about the sampled population, not as interesting in itself --but I can be wrong in this interpretation!

What do you think?

ADD REPLY • link 10.5 years ago by alesssia ▴ 580

1

Entering edit mode

Yep, that is what I was suggesting. However, if I recall correctly DESeq2 always tests for the lest factor given in the design Matrix. Thus, in your example it would test for family differences. Devon please correct me here if I am wrong. Haven't looked into DESeq2 for a while now (shame on me... :D ).

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by Phil S. ▴ 700

0

Entering edit mode

Well, it'll default to testing and plotting according to the last factor. For the results, you can just specify what you want easily enough (I don't recall there being a convenient way to always do that for the plots).

But in general you're correct that things are a bit simpler if one puts a variable of interest last.

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by Devon Ryan 105k

0

Entering edit mode

Yeah, it was my mistake in writing the formula!

Thanks both for your help!

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by alesssia ▴ 580

0

Entering edit mode

Hi,

In your solution, the model takes care only if the samples belong to the same family. I would like to know how we can add the level of relatedness such as MZ twin and DZ twin. Currently, I have a variable zygosity with 2 levels (MZ and DZ).

Do you thus suggest to use the following model?

~  Covariate + Technical.Confounder + familyID + zygosity + condition

and how we do the same thing if we are also also members of family.

Regards,
Tiphaine

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.4 years ago by tiphaine ▴ 10

0

Entering edit mode

Please post this as a new question.

ADD REPLY • link 10.4 years ago by Devon Ryan 105k