Question

Differential expression with MAST using scran-normalized data

0

Entering edit mode

6.9 years ago

smt8n • 0

Hi,

I am trying to get DE genes from a scRNA-seq using MAST. The data I am supplying were normalized using scran package. In the MAST vignette, as well as in a separate tutorial, they get a scaling matrix for column sums before running regression and use it as another regression variable, besides the "condition". This looks like another normalization-like step to me. In the vignette, they use RSEM data as input, which, as I found elsewhere, needs normalizing. Do I need this step if I supply data that were already normalized with scran? Thank you.

mast scran scRNA-seq • 4.3k views

ADD COMMENT • link 6.9 years ago by smt8n • 0

score 0 · Answer 1 · 2018-08-28

That is a good question.

I had thought you would provide raw counts, but I can see a tutorial where the input is log2(TPM + 1) expression values. The Finak et al. 2015 paper also says "MAST has greatest efficiency when the continuous (log)-expression is normally distributed". So, I think my assumption about using raw counts may not be correct.

So, the developer can probably provide a better answer than I can. However, I can see that the input to the lrTest() function is a specific type of object ("LMlike or subclass"... so, not an data frame or matrix with counts) and the zlm() function (or the LRT() function) has a
"SingleCellAssay" object as the input. So, I think there are at least 3 strategies for p-value calculations, but the code for differential expression for any particular strategy would be similar regardless of what upstream normalization is used.

score 0 · Answer 2 · 2018-08-29

Charles, thank you for the answer. At this point, I think that since the whole assumption of the package authors, as they state in the whitepaper, is that "CDR is a proxy for unobserved nuisance factors that should be explicitly modeled", the "cngeneson" factor should be kept there. It was not quite clear to me how the previous normalization affects the procedure, but since, again, the authors do not mention this caveat, I assume that it should be fine.