Question: Deconvolution Methods on RNA-Seq Data (Mixed cell types)
gravatar for Paul.Lin
3.5 years ago by
Paul.Lin110 wrote:

Dear all, 

I want to use deconvolution methods to estimate the proportions of different cell types in my RNA-Seq samples. In this post ( Differential gene expession analysis in cell populations of mixed tumor and normal cells ), it's mentioned that "signals from different cell-types/tissues will sum more linearly in microarrays than RNAseq, where the sum is highly non-linear" and  "Any paper talking about signal separation will likely mention that the signals need to be independent for optimal performance, which they self-evidently aren't in RNAseq." Could someone please explain to me why in RNA-Seq samples the signals from different cell-types/tissues are not independent, or why the signals don't sum linearly?

Also, if I do decide to go ahead with using deconvolution methods, should I apply the deconvolution methods to raw RNA-Seq counts, log(CPM) transformed data, or voom transformed data?



rna-seq deconvolution • 13k views
ADD COMMENTlink modified 17 months ago by inesdesantiago160 • written 3.5 years ago by Paul.Lin110

Someone in our group just gave a journal club talk on "CIBERSORT". They at least claim that it can be used for RNAseq data and might be useful for you if you just want to know something like, "what percentage of each sample is composed of one of a number of cell types". I'm still a bit dubious about the method, but in theory it or something like it could possibly work.

ADD REPLYlink written 3.5 years ago by Devon Ryan89k

Agreeing with Devon, I recently used this software for RNAseq data,and it seems to give results on RNAseq as well,in terms of percentages of different cell types

ADD REPLYlink written 2.5 years ago by Ron920

CIBERSORT is designed for immune cell types. If you aren't specifically looking a mixture of immune cells, you might want to use a more generalized deconvolution strategy.

If you are looking at bulk tumor expression, I would typically expect some sort of percent tumor value from the pathologist, which you could use in your differential expression model (if that's available for your samples, that might be a good alternative / positive control option).

ADD REPLYlink written 2.3 years ago by Charles Warden6.6k

Does anyone know of other cell type signatures besides LM22 from CIBERSORT which has only 22 cell types?

Also,are there any other tools for RNAseq besides CIBERSORT and DeconRNAseq?

ADD REPLYlink written 2.2 years ago by Ron920

Dvir Aran from Atul Butte's lab at UCSF has recently come out with a new tool, xCell, for RNAseq-based deconvolution that might be worth looking into (it is very easy to use):

ADD REPLYlink written 2.1 years ago by stephanie.hilz40

If you can wait, I will have a new cell deconvolution method coming out in a publication. This was tailoured for detecting immune cell populations from RNA-seq.

ADD REPLYlink modified 5 months ago • written 17 months ago by Kevin Blighe41k

Hi Kevin

Just curious if this has been published yet? Looking into a variety of deconvolution methods for RNA-seq, and would be very interested in the method you've developed.

ADD REPLYlink written 5 months ago by Alex.blain0

Hi Alex, that work is continued by my now former colleagues, as I moved over to USA in 2016. I am still in touch, however, and I understand that they are still trying to publish the work. Note that the deconvolution part is only one part of a manuscript that is heavily focused on molecular biology. Have no other programs yet been released in this area?

ADD REPLYlink written 5 months ago by Kevin Blighe41k
gravatar for Devon Ryan
3.5 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

Since you're referring to something I wrote, perhaps it's best if I reply :)

The thing with RNAseq is that it's a 0-sum game, there are a finite (though large) number of reads that will be sequenced and all of the transcripts/genes are competing with each other for them. This ends up creating a dependence between the measured expression of each transcript/gene, since every read sequenced from gene A means that there's one fewer read that can be sequenced from gene B (and C and D and...). The question then becomes how big of a problem this is. I don't really know the answer to that, perhaps someone has done a study on it.

Regarding what to use, you might have luck with logCPM.

ADD COMMENTlink written 3.5 years ago by Devon Ryan89k

I just wanted to give my intution that this might be an important concern. A lot of approaches rely on signature genes, that are highly expressed in 'pure' tissues. Say A and B are signature genes for tissue 1 and C for tissue 2, if all genes are expressed at equally elevated levels and the true mixture was 80/20, then A and B together might 'steal' over-proportionally more reads together which make gene C get less reads, thereby underestimating the contribution of tissue 2. 

ADD REPLYlink written 3.5 years ago by Michael Dondrup46k

@ Devon: I am not sure if I understand this point correct.

My understanding is that this premise holds true, when gene B (or C)  transcript availability (i.e expression quantity) is limiting compared to A. If transcripts for A and B (for eg housekeeping genes) are identical in expression, does this premise still hold true i.e competition between reads to get sequenced?

In addition, doesn't it depend on it length of the gene as well ?

ADD REPLYlink written 3.5 years ago by cpad011211k

Even if genes are identically expressed they'll still be competing. All genes/transcripts are competing against each other. This is the one nice thing about microarrays, since the probes are independent.

Yes, length comes into play too.

ADD REPLYlink written 3.5 years ago by Devon Ryan89k

Thanks, Devon! I understand the dependence between gene-level RNA-Seq reads now. :) I need some time to think about what the possible consequences are; meanwhile do you have any suggestion on how to estimate the proportions of different cell types apart from deconvolution methods?

With regard to logCPM, I understand that CPM normalises the data according to library sizes hence make data from different samples comparable; but what's the purpose of log? Is it to make it more like micro-array data? If so, doesn't voom transformation make the RNA-Seq data even more like micro-array data hence a better option here?

ADD REPLYlink written 3.5 years ago by Paul.Lin110

If you still have the raw samples then I've had excellent luck with qPCR. In fact, the only reason I'm familiar with this is that we (and much of the field it turned out) had contaminated samples that were screwing up results. qPCR ended up being the best method to prescreen things before sequencing. I had tried signal separation methods but never got great results (it worked well for microarray datasets though).

The purpose of the log is to change the range of the data so it no longer starts at 0, but instead extends from -infinity to +infinity. The math tends to behave a better when you don't have restricted ranges (this is also why people use log2-fold change for everything in RNAseq).

ADD REPLYlink written 3.5 years ago by Devon Ryan89k

@ Devon


ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by cpad011211k

Hi Ryan.

I am still a bit confused about how would such 0-sum game causes non-linearity. This may cause some sort of dependency between variables but the overall expression is still the weighted sum of expression of its components (different cell type), right?

ADD REPLYlink written 3 months ago by CY310

This thread is very old and it is holiday season. However, just plot RNA-seq count data as a histogram and you will clearly appreciate the non-linearity. It has been found that a negative binomial distribution is better for modeling RNA-seq count data.

Your second question relates more to the type of deconvolution that is being used. I am yet to see any clear winner in terms of methods for RNA-seq deconvolution.

ADD REPLYlink written 3 months ago by Kevin Blighe41k
gravatar for Shicheng Guo
2.8 years ago by
Shicheng Guo7.4k
Shicheng Guo7.4k wrote:

Whether you need do log-transform or not, dependent on the method (code/script). It is easy to decide whether you should do it or not, that is, do a mixture by yourself, and then do the de-convolution with there code/script, and to compare the input and deconvolution result. In my experience, you need try to use raw data, counts, signal, log-transform, logit-transform and then you can find which one is the best way. I prefer to do log and logit transform.

ADD COMMENTlink written 2.8 years ago by Shicheng Guo7.4k
gravatar for ankurchakravarthy
3.4 years ago by
United Kingdom
ankurchakravarthy20 wrote:

Do NOT use log2 cpms. The data need to be in non-log linear space. 

I quote

"The samples profiled within PRECOG primarily represent bulk diagnostic pre-therapy tumor specimens, which often contain a variety of cell types, including diverse TALs. Given the enrichment of lymphocyte markers in favorably prognostic genes across PRECOG (Figs. 1d and2d), a method to systematically 'unmix' or deconvolve bulk tumor GEPs in PRECOG may reveal new insights into tumor immunobiology. We recently developed a new approach for CIBERSORT, a machine-learning method that outperformed other approaches in benchmarking experiments16. CIBERSORT produces an empirical P value for the deconvolution using Monte Carlo sampling. Like other linear deconvolution methods, CIBERSORT only operates on expression values in non-log linear space75. "


ADD COMMENTlink written 3.4 years ago by ankurchakravarthy20

agreed! the method is for data in linear space.

ADD REPLYlink written 2.3 years ago by ash0
gravatar for Ron
2.3 years ago by
United States
Ron920 wrote:

Here is another package that can be used :

ADD COMMENTlink written 2.3 years ago by Ron920
gravatar for inesdesantiago
17 months ago by
United Kingdom
inesdesantiago160 wrote:

New tools for RNA-seq tumors:

QuantiSeq and DeMixT

ADD COMMENTlink written 17 months ago by inesdesantiago160
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1421 users visited in the last hour