Question: For Mean Expression/FC Calculations in scRNA-seq, should I use All cells or only Expressed Cells?
0
gravatar for achits
14 months ago by
achits20
achits20 wrote:

I'm doing a differential test for monocle and they show that differentialGeneTest() gives the features that are different between your model but doesn't tell you about which specific genes go up for particular groups. Per there documentation, they state "We could also simply compute summary statistics such as mean or median expression level on a per-CellType basis to see this, which might be handy if we are looking at more than a handful of genes."

This makes sense and I have a calculated normalized expression matrix, my main question is does one normally use all single cells to calculate the mean expression, including the cells that have no detectable level or just expressed cells? So for example, a scenario were condition 1 has 400 total cells and 300 cells express geneA and Condition 2 has 200 total cells and only 50 express geneA. If I'm calculating a FC for geneA do I compare

meanexpression(400 TOTAL cells)/meanexpression(200 TOTAL cells) OR

meanexpression(300 EXPRESSING cells)/mean(50 EXPRESSING cells).

I can see how there would be bias in both and so I wonder which is used in the field?

monocle next-gen scrna-seq • 629 views
ADD COMMENTlink modified 14 months ago by Charles Warden7.0k • written 14 months ago by achits20
0
gravatar for Charles Warden
14 months ago by
Charles Warden7.0k
Duarte, CA
Charles Warden7.0k wrote:

It is probably a good idea to do some extra QC filtering (such as for cells with a minimum number of covered genes, and cells with a sufficiently low percentage of mitochondrial reads), but the criteria that can/should be applied will likely vary between projects.

I'm not sure how easy it is to do this with Monocle (or what specific functions to recommend). However, some other potential options would be:

1) Use direct counts for p-values (and use relatively standard RNA-Seq methods like edgeR / limma-voom, or you may be able to try some scRNA-Seq specific methods like MAST), and use CPM values for calculating fold-changes (or some other normalized count, if the goal is to have something to compare to what is provided by the differential expression program)

2) Use Seurat scaled expression for the fold-change calculation, and potentially use standard statistical tests (like lm() for linear-regression, aov() for ANOVA, etc.) to compare differential expression between groups of cells.

ADD COMMENTlink modified 14 months ago • written 14 months ago by Charles Warden7.0k

Monocle runs something similar to DEseq but doesn't have a results() function (of DESeq) that calculates that. I already have a normalized counts table, as I said above, and since I'm running through the program, I of course quality filter already. I'm just wondering the question of which cells to use (all or only expressed). Not gonna publish these fold changes just want a value to sort by... the question is does one normally use all single cells to calculate the mean expression, including the cells that have no detectable level or just expressed cells?

ADD REPLYlink written 14 months ago by datascientist28390
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 938 users visited in the last hour