Question: Differential gene expession analysis in cell populations of mixed tumor and normal cells
4
gravatar for Christian
4.3 years ago by
Christian2.7k
Cambridge, US
Christian2.7k wrote:

I am wondering about the correct experimental design for differential gene expression analysis in the following atypical setup.

We have several patients from which we have three RNA-seq measurements each: one from pure tumor tissue (T), one from normal tissue (N), and one from a tissue that contains a (known) mixture of tumor and normal cells (M).

We are interested in genes that are differentially expressed between T and M, but only in the tumor cell fraction of M! Due to the contamination of M with normal cells, the naiive comparison of T with M yields too many DEGs, because many genes are differentially expressed between tumor cells in T and normal cells in M. I am therefore looking for a way to somehow "subtract" the known gene expression signature of N from M to get a clearer picture of the genes that are differentially expressed only in tumor cells.

Can this somehow be accomplished using a multi-factorial experimental design in DESeq2 or EdgeR? Or should I just get all three possible pairwise comparisons and do something like this:

DEG(T/M) = list of differentially expressed genes between tumor and mixture
DEG(T/N) = list of differentially expressed genes between tumor and normal
DEG(M/N) = list of differentially expressed genes between mixture and normal

DEG(T/M) due to differences in tumor cells = DEG(T/M) - intersect(DEG(T/N), DEG(M/N))

The logic here is that from all genes that are differentially expressed between T and M, we exclude those that also show up as differentially expressed in both comparisons with N, as these genes represent differences between tumor and normal cells, and not between the two tumor cell populations.

If find this a somewhat unsatisfying ad-hoc solution, so I am open for any suggestion.

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Christian2.7k
1

This is a little after the fact, but did you consider trying to find a cell surface marker and just flow-sorting the cells from the different populations before sequencing? While finding a marker might be a pain, at least then the statistics become straight-forward.

For what it's worth, the "intersect and subtract" method has issues, but it sounds like you're aware of that (unfortunately a perfect solution isn't coming immediately to me...hopefully someone else will be familiar with one).

BTW, do you have any estimates of the admixture in the mixed samples? I imagine that'd be useful in more correctly calling DE genes.

ADD REPLYlink written 4.3 years ago by Devon Ryan88k

Actually we do have surface markers that we use to enrich for tumor cells with magnetic beads, but we cannot get rid of normal cells "sticking" on the tumor cells. So the max enrichment we see is 70-80%, sometimes lower down to 30%.

The admixture rate is known and could be used to model read counts as linear combination of tumor and normal cell fractions. But then I have no idea if such a simple linear model is justified and applicable over the huge dynamic range of gene expression values.

ADD REPLYlink written 4.3 years ago by Christian2.7k
1

Just found a recent review (Oct 2013) that is a great place to start to dive into this topic:

Shen-Orr and Gaujoux (2013)
Computational Deconvolution: Extracting Cell Type-Specific Information from Heterogeneous Samples

Table 1 provides a great summary of available methods, some of which look promising for my problem at hand.

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Christian2.7k
1

Keep in mind that deconvolution methods tend to work better with microarray data, it's one of the few areas where microarrays are still better. The reason is that signals from different cell-types/tissues will sum more linearly in microarrays than RNAseq, where the sum is highly non-linear. This allows for cleaner component separation (many methods rely on ICA or similar techniques, so I'm using that nomenclature here).

ADD REPLYlink written 4.3 years ago by Devon Ryan88k

What if we log-transform the RNA-seq data first, for example by using the regularized log transformation (rlog) or variant stabilizing transformation (VST) from DESeq2? Could we then somehow feed the transformed RNA-seq data into array-based deconvolution methods?
 

ADD REPLYlink written 4.3 years ago by Christian2.7k
1

There's no transformation that will solve that problem, since it affects the original data.

ADD REPLYlink written 4.3 years ago by Devon Ryan88k

"The reason is that signals from different cell-types/tissues will sum more linearly in microarrays than RNAseq, where the sum is highly non-linear"

Can you please point me toward a reference for this?  I'm relatively new to the informatics scene.

ADD REPLYlink written 3.6 years ago by Ed Glass0

I don't have any references handy at the moment, you'll have to look them up. Any paper talking about signal separation will likely mention that the signals need to be independent for optimal performance, which they self-evidently aren't in RNAseq.

ADD REPLYlink written 3.6 years ago by Devon Ryan88k
1

I'm curious what you are referring to here. Do you mean the "signal" (counts) you get per gene within sample are not independent in RNA-seq but they are in microarray?

Is this "self-evident" because RNA-seq is inherently a competitive assay (since each gene is "competing" for read counts given the fixed depth of sequencing) vs. microarray which is not (aside from off-target effects to unintended probes)?

ADD REPLYlink written 3.5 years ago by Steve Lianoglou5.0k

Yup, exactly. And yeah, this ignores the cross-hybridization issue in microarrays.

ADD REPLYlink written 3.5 years ago by Devon Ryan88k
3
gravatar for Sean Davis
4.3 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

You might take a look at this paper for some ideas:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4016702/

This method might work for your gene expression dataset as well.  In any case, this might be a direction worth heading.

ADD COMMENTlink written 4.3 years ago by Sean Davis25k
1

Interesting, I hadn't seen this in the context of methylation.  But there is quite a bit of literature on deconvolution in gene expression datasets derived from cellular mixtures.  As one place to start, take a look at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3699332/ and the references therein.

ADD REPLYlink written 4.3 years ago by Ahill1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2475 users visited in the last hour