Hi,
I'm working on several projects that require differential expression and I have a question regarding DESeq2 design model for matched samples (not paired).
I don't know if there is a standardized way of using these terms, but assuming that:
- Matched data will be two different populations in which an attempt has been made to reduce the variables by matching for certain characteristics that might impact the response being studied but which aren't the focus of the study. For example, age, gender, smoking status, etc.
- Paired data is two populations of numbers in which the same variable has been measured on the same population usually at two different times, or under two different conditions. For example, before and after treatment with a given drug.
(copied from https://community.jmp.com/t5/JMP-Wish-List/matched-versus-paired-data/idi-p/546968)
From the DESeq2 vignette, I understand that we should include paired samples in the design (https://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#can-i-use-deseq2-to-analyze-paired-samples). However, is this also the case for matched samples?
I'm specifically interested in age-matched control and disease samples from healthy/sick donors.
Thank you, Marieke
EDIT: side note on specific situation: adjacent tumor/healthy tissue from same individual
In the EdgeR documentation, there is an example with 'RNA-Seq of oral carcinomas vs matched normal tissue' where they use the word matched for this situation and then add it to the design model: design <- model.matrix(~Patient+Tissue)
So here, the answer would be 'yes, you should add matching samples to the design model'. However, this situation is quite specific and could be somewhere in between the definitions of matched/paired written above.
EdgeR defines paired samples as:
- Paired samples occur whenever we compare two treatments and each independent subject in the experiment receives both treatments.
- but does not specify what they would consider matched samples
Ah yes, looking at the PCA makes sense, thanks!
I'm not sure I understand your last paragraph, though. With 'running tests', do you mean running DESeq with and without that coefficient and checking the difference in results? This would then be double dipping, but not really an issue according to you if it's to ignore a coefficient.
No thats not what I mean.
When you run DESeq, if fits values to each of the coeffients that you specify. You then compute p-values on some of those coefficients - this is what I mean by doing a test. If you were to do PCA, see a separation by a factor, then add that to the design and compute a p-value for the coefficient, that would be double dipping. Commuting p-values for your coefficient of interest (e.g. Cancer/Not Cancer) with and without the matching factor would also be dodgy practice. However, I think deciding whether or not to include the matching criteria from a PCA is okay, because you don't compute a p-value on the coefficient fit to the value of the matching factor (e.g. in your example above, you don't ask if each patient is significantly different from each other).