Question

Why after normalization of scRNA-Seq data by DESeq2 the p-value of a large number of genes equal zero? How to deal with them?

0

Entering edit mode

22 months ago

Zahra ▴ 110

Hi all

I have two Seurat object for scRNA-Seq and extract the raw count from the object:

#crating two matrix for raw counts
ccl4_count <- as.matrix(ccl4_list@assays[["RNA"]]@counts)
ctrl_count <- as.matrix(ctrl_list@assays[["RNA"]]@counts)

Then I use the DESeq2 package for normalization and DEG analysis:

#creating one expression matrix
mat.ctrl_ccl4 <- cbind(ctrl_count,ccl4_count)

#grouping 
gr <- factor(c(rep("ctrl", ncol(ctrl_count)) , rep("ccl4", ncol(ccl4_count))))
coldata <- data.frame(group=gr)

dds <- DESeqDataSetFromMatrix( mat.ctrl_ccl4 , coldata, design= ~group)

#the script bellow takes a long time :|
dds <- DESeq(dds)
normalizedCount <- log2(1+counts(dds, normalized=TRUE))

#finding DEG
dif <- data.frame(results(dds, c("group", "ccl4" , "ctrl")))

That is all my script. Eventually in the dif table, I have about 2000 genes with 0 p-value and now I don't know how to deal with these genes. They make the volcano plot seem incorrect.

enter image description here

The other problem is that normalization of single-cell data by DESeq2 is really time-consuming because of the large number of cells.

Does anyone have any idea of how to efficiently perform DEG analysis for single-cell analysis? Thanks for any help.

scRNA-Seq DEG DESeq2 volcano Seurat • 1.2k views

ADD COMMENT • link updated 22 months ago by leaodel ▴ 190 • written 22 months ago by Zahra ▴ 110

score 1 · Answer 1 · 2022-06-17

1

Entering edit mode

22 months ago

tomas4482 ▴ 390

First, scRNA-seq has techinical limitation and sparsity is one of the biggest challenge. scRNA raw count is a sparse matrix. If you use raw count from all cells (meaning you take every cell as an individual sample) as input for DESeq2, many genes in many samples may not have any expression. That's why you get a wierd volcano plot.

Second, DEG test in single cell data could be made by many other algorithms. DESeq2 is designed for bulk data.

ADD COMMENT • link 22 months ago by tomas4482 ▴ 390

1

Entering edit mode

The "weird" plot is based on pvalues that are near or identical to the machine limit of displaying numericals. You can get that with

> .Machine$double.xmin
[1] 2.225074e-308

Not much to do about that other than in any case using something more appropriate for scRNA-seq data. Either the Wilcox test or something like limma-trend based on the logCPMs. This here is a great read to get started:

https://bioconductor.org/books/release/OSCA/

ADD REPLY • link 22 months ago by ATpoint 81k

1

Entering edit mode

Pseudobulk methods have been showing promise for DE in single-cell. Check this paper out

ADD REPLY • link 22 months ago by leaodel ▴ 190