Question: Differential expression with MAST using scran-normalized data
0
gravatar for smt8n
12 months ago by
smt8n0
smt8n0 wrote:

Hi,

I am trying to get DE genes from a scRNA-seq using MAST. The data I am supplying were normalized using scran package. In the MAST vignette, as well as in a separate tutorial, they get a scaling matrix for column sums before running regression and use it as another regression variable, besides the "condition". This looks like another normalization-like step to me. In the vignette, they use RSEM data as input, which, as I found elsewhere, needs normalizing. Do I need this step if I supply data that were already normalized with scran? Thank you.

scran mast scrna-seq • 609 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by smt8n0
0
gravatar for Charles Warden
12 months ago by
Charles Warden7.2k
Duarte, CA
Charles Warden7.2k wrote:

That is a good question.

I had thought you would provide raw counts, but I can see a tutorial where the input is log2(TPM + 1) expression values. The Finak et al. 2015 paper also says "MAST has greatest efficiency when the continuous (log)-expression is normally distributed". So, I think my assumption about using raw counts may not be correct.

So, the developer can probably provide a better answer than I can. However, I can see that the input to the lrTest() function is a specific type of object ("LMlike or subclass"... so, not an data frame or matrix with counts) and the zlm() function (or the LRT() function) has a
"SingleCellAssay" object as the input. So, I think there are at least 3 strategies for p-value calculations, but the code for differential expression for any particular strategy would be similar regardless of what upstream normalization is used.

ADD COMMENTlink written 12 months ago by Charles Warden7.2k
0
gravatar for smt8n
12 months ago by
smt8n0
smt8n0 wrote:

Charles, thank you for the answer. At this point, I think that since the whole assumption of the package authors, as they state in the whitepaper, is that "CDR is a proxy for unobserved nuisance factors that should be explicitly modeled", the "cngeneson" factor should be kept there. It was not quite clear to me how the previous normalization affects the procedure, but since, again, the authors do not mention this caveat, I assume that it should be fine.

ADD COMMENTlink written 12 months ago by smt8n0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2200 users visited in the last hour