I am currently analyzing a dataset containing the following DESeq2 design:
sample group continuous_value
sample_a A 35
sample_b A 10
sample_c B 2
sample_d B 5
design(Experiment) <- formula(~ continuous_value + group)
Each sample belong to a group containing 5 individuals: Group A contains the WT samples and Group B the knock down samples.
For each sample a continuous value (in percentage) is associated. This value depicts the percentage of cells in this sample that are the one I'm interested in. In other words, each sample contain cells from the same cell type but only x% of them are the one that have the phenotype I want to analyse. Since the % of cells of interest varies from one sample to another I would like to normalize the results in consequence.
The question is the following: How DESeq2 handles these continuous values? Is this design the most appropriate?
I am afraid I am not sure to fully understand the DESeq2 vignette part that talks about it.
I already tested three approaches:
- With this %
- Without this %
- Transform the % into small number of bins as advice in the vignette. Unfortunately, I got the error: "Error in DESeqDataSet(se, design = design, ignoreRank) : the model matrix is not full rank, so the model cannot be fit as specified.one or more variables or interaction terms in the design formula are linear combinations of the others and must be removed". Moreover, we currently don't have any biological information that could allow us to cluster those % into groups and so the cut-off between the groups are arbitrary.
Thanks in advance for your answers!