Differential expression testing in Seurat
1
0
Entering edit mode
21 months ago
suvratha ▴ 60

As I've understood Seurat properly, in the initial steps, it performs scaling and centering of the data so that your data resembles a normal distribution using the ScaleData() function. If this is the case, then why does the default test for differential expression have the Wilcoxon test when it is a non-parametric test? would it not be better off to use DESeq2 instead and trust the results from DESeq2?

Please do correct me if i'm going wrong in my understanding anywhere here.

Thank you. Suvi

RNA-Seq seurat R • 2.4k views
0
Entering edit mode

If you interpret the hundreds of cells you have per sample as replicates, there shouldn't be much need for the sophisticated modelling that DESeq2 does to overcome the typical limitations of bulk RNA-seq data (namely: lack of replicates). A t-test (or, alternatively, Wilcoxon test) usually works fine if you have hundreds of replicates per gene. That being said, DESeq would use the raw read counts, too, not the scaled data.

0
Entering edit mode

so which results should i report? the one from Wilcoxon test or from DESeq2? Also the number of differentially expressed genes I get from DESeq2 is way more than the number of genes I get from the Wilcoxon test, so I don't know which ones to trust. i.e. - the genes I'm looking into is gets detected only when I use DESeq2 and not Wilcoxon test. So, I don't know what to do.

0
Entering edit mode

in silico there isn't that much you can do at a single-gene level, but if you're interested in just a single gene, I would strongly recommend to look at the expression pattern of your gene of interest in the groups you're comparing. Get an idea of why that gene seems to be borderline DE as it is being missed by one method. That means, looking at both raw counts as well as normalized data may be helpful.

The only way to know whether your gene is biologically important for whatever conditions you're looking at is to set up an appropriate experiment.

0
Entering edit mode

I see, thank you for the explanation. My question still remains - which one do i pick for differential expression testing? Wilcoxon test or DESeq2?

0
Entering edit mode

My argument would be that it does not matter. It is more important to understand why the tests disagree for your specific gene IMO.

0
Entering edit mode

The only way to know that would be by doing what you suggested in your previous comments?

0
Entering edit mode

I'm sure there are more ways, but that's how I would start going about it, yes.

0
Entering edit mode
21 months ago
ATpoint 57k

The differential testing is performed on the normalized count data, not on the Z-transformed data. FAQ 4 and several GitHub issues briefly this https://satijalab.org/seurat/faq

0
Entering edit mode

so then what's the point of the ScaleData() function?

0
Entering edit mode

Mostly for visualization purposes

0
Entering edit mode

This is not correct, ideally differential expression is tested on raw data. Also see Luecken et al. 2018.

0
Entering edit mode

The Wilcox test that the toplevel data is addressing is for sure not done on raw data. That would be confounded by sequencing depth and as such would be meaningless. Testing is done on raw data if you use model-based approaches such as DESeq2 where the scaling factors are used as offsets in the (G)LMs.