Is it appropriate to use counts = counts + 1 when using DESeq2 on scRNAseq data?
1
1
Entering edit mode
3.0 years ago
jrleary ▴ 190

I'm attempting to perform differential expression analysis using the FindAllMarkers function included in Seurat. I'd like to use the DESeq2 function as I've had good results on bulk RNAseq using that package in the past. However, when I try to run it, it throws an error since there are so many zero values in my counts matrix. Would it be statistical malpractice to simply add 1 to each matrix entry? I'm pretty sure this is done in differential expression analysis of bulk RNAseq but it's been a while so I can't remember.

sc-rnaseq differential expression • 2.2k views
1
Entering edit mode

You should remove genes with zero counts across the board (i.e. no expression in any sample). That removes irrelevant data points and is not "statistical malpractice"

EDIT: my statement applies to bulk RNAseq. I’m not sure if it’s relevant to scRNA-Seq. Sorry!

0
Entering edit mode

Do you know if there's any way to do this within Seurat? This post seems to indicate that they don't have any gene filtering functionality in their package. Or could you recommend another way to remove lowly expressed genes?

0
Entering edit mode

I edited my comment. I’m not sure my comment on DESeq2 applies to single cell experiments.

0
Entering edit mode

There are tools out there that impute values for zeros in RNAseq. I'm not an expert, so I make no claims about how well they work, or which is best, but you might like to check out:

1
Entering edit mode
3.0 years ago
igor 13k

There is no need to call DESeq2 yourself. FindAllMarkers has an argument test.use where you can specify "DESeq2". That should take care of any necessary data transformation.

If you are really curious, you can see exactly what it's doing by checking the source code for DESeq2DETest: https://github.com/satijalab/seurat/blob/49a1be0427f2f26a531eb468ba93eeb18d8a2edb/R/differential_expression.R#L930-L962

0
Entering edit mode

Yeah that's what I'd been doing, but I get an error saying that "every gene contains at least one zero, cannot compute log geometric means". I assume this is because there are rows in my counts matrix that are entirely zero, i.e. that the gene isn't expressed in any of the cells?

0
Entering edit mode

Does it give you an error with other tests or just DESeq2?

0
Entering edit mode

No, using e.g. a Wilcox test (or any test not based on a model using a log-link function like DESeq2 does) works fine.

0
Entering edit mode

That is odd. I would post this as an issue on Seurat's Github site: https://github.com/satijalab/seurat/issues