Single cell DEG analysis. Wildly different results from different FindMarkers test
1
0
Entering edit mode
15 months ago
jevanveen ▴ 20

Hello and thank you in advance for your help.

I am doing a single cell RNA seq experiment and analyzing data with Seurat. I have four treatment groups, which I integrated, then identified clusters of different types of cells in the brain. Everything looked exactly right at this point. Then I wanted to test for DEGs within each cluster. Using the standard Seurat workflow of switching default assay to "RNA" and then using FindMarkers(), I get kind of strange results. Here is a sample volcano plot from one cluster:

If I switch to a statistical method in FindMarkers (like negbinom) that uses RNA counts instead of scaled data, things look much more as I would expect. Here is the same cluster, tested using counts as data.

I have triple checked that I am performing the Seurat workflow as per the tutorial here: https://satijalab.org/seurat/v3.1/immune_alignment.html

It just seems so odd to me that when I do DEG testing the apparently correct way, I get huge LogFC's and also a weird looking volcano plot with many many significant (padj < .05) DEGs with logFC close to zero.

Has anyone encountered a similar issue? Any help would be great. Thank you!

rna-seq seurat • 3.3k views
2
Entering edit mode

If I switch to a statistical method in FindMarkers (like negbinom) that uses RNA counts instead of scaled data, things look much more as I would expect. Here is the same cluster, tested using counts as data.

FindMarkers does not use scaled data. For most test, such as "wilcox", the default slot is "data", which is normalized counts. Using "counts" (raw counts) does not make sense in that case.

0
Entering edit mode

Sorry Igor, you are of course correct that the "data" slot is log normalized, and NOT scaled.

But when I do it the correct way, using the normalized data slot, things look wrong. When I use counts, which is only recommended for tests like negbinom or DESeq2, the results look much more correct, irrespective of the statistical test that I use.

I was mostly wondering if people had encountered similar results - I haven't seen people do many volcano plots on single cell data.

0
Entering edit mode

You posted 4 commands and 2 plots. I am not sure which command corresponds to the first plot, but the logFCs are above 20. Something seems wrong there regardless of the test used. The test should only define the p-value.

1
Entering edit mode

You really ought to spell out the two different findMarkers commands

0
Entering edit mode

Right you are! Commands that will give data producing the first plot:

FindMarkers(all.cells.integrated, ident.1 = "Neuron_LowEarthOrbit", ident.2 = "Neuron_Underwater", test.use = "wilcox")
FindMarkers(all.cells.integrated, ident.1 = "Neuron_LowEarthOrbit", ident.2 = "Neuron_Underwater", test.use = "LR")


Commands that will give the second plot:

FindMarkers(all.cells.integrated, ident.1 = "Neuron_LowEarthOrbit", ident.2 = "Neuron_Underwater", test.use = "negbinom")
FindMarkers(all.cells.integrated, ident.1 = "Neuron_LowEarthOrbit", ident.2 = "Neuron_Underwater", test.use = "wilcox", slot = "counts")

0
Entering edit mode

How could you make the volcano plot? In the Seurat integration tutorial, there no method to draw this.

0
Entering edit mode

Please do not add answers unless you're answering the top level question. Use Add Comment or Add Reply as appropriate. I've moved your post to a comment this time.

1
Entering edit mode
15 months ago
jevanveen ▴ 20

I partly figured this out and thought I would update in case someone else comes looking with the same issue.

It seems that I was getting crazy LogFC's and weird looking volcanoes because of a problem with the normalization of my data. For these analyses I had used the Seurat "SCT integration" pipeline as outlined here: https://satijalab.org/seurat/v3.2/integration.html

I probably did something wrong, but after triple checking the workflow, I couldn't find it. When I switched back to an integration workflow that includes log normalization, rather than SCT normalization, everything appears to be fixed.

So in the end, I am unsure of why the SCT integration pipeline failed on me, but switching to log normalization was the solution to my problem.

0
Entering edit mode

SCT Normalization stores the normalized counts in the SCT assay under the counts slot. Because of this, the counts slot is not changed in the RNA assay.

0
Entering edit mode

To avoid any potential confusion, you can specify both slot and assay in FindMarkers.

0
Entering edit mode

Just for your information.

Seurat lab suggested that DE analysis on obj@assay$RNA@data, which is normalized data (not scaled). So you are right, you should NormalizeData and then FindMarkers. However, Seurat 4 (not sure which small version exactly) starts to promotes DE analysis on obj@assay$SCT@scale.data, which is person residual.

My experience is to try both and take the one that fits your goal because Seurat said both are not incorrect.