I am wondering about the correct experimental design for differential gene expression analysis in the following atypical setup.
We have several patients from which we have three RNA-seq measurements each: one from pure tumor tissue (T), one from normal tissue (N), and one from a tissue that contains a (known) mixture of tumor and normal cells (M).
We are interested in genes that are differentially expressed between T and M, but only in the tumor cell fraction of M! Due to the contamination of M with normal cells, the naiive comparison of T with M yields too many DEGs, because many genes are differentially expressed between tumor cells in T and normal cells in M. I am therefore looking for a way to somehow "subtract" the known gene expression signature of N from M to get a clearer picture of the genes that are differentially expressed only in tumor cells.
Can this somehow be accomplished using a multi-factorial experimental design in DESeq2 or EdgeR? Or should I just get all three possible pairwise comparisons and do something like this:
DEG(T/M) = list of differentially expressed genes between tumor and mixture
DEG(T/N) = list of differentially expressed genes between tumor and normal
DEG(M/N) = list of differentially expressed genes between mixture and normal
DEG(T/M) due to differences in tumor cells = DEG(T/M) - intersect(DEG(T/N), DEG(M/N))
The logic here is that from all genes that are differentially expressed between T and M, we exclude those that also show up as differentially expressed in both comparisons with N, as these genes represent differences between tumor and normal cells, and not between the two tumor cell populations.
If find this a somewhat unsatisfying ad-hoc solution, so I am open for any suggestion.