I am trying to perform a more faster way of differential analysis between KO and WT single cell data. I have intergated the data using
Seurat pipeline and the differential expression using
Seurat implementation. As I have about 4k cells in each clusters, it is taking forever. Meanwhile I found the
edgeR and make it working based on consolidating many biostar discussions and the
edgeR manual as below
## subset Seurat object for the cluster seurat_object_cellfor_edgeR <- subset(seurat_object, subset = seurat_cluster == 0) # count matrix counts <- as.matrix(seurat_object_cellfor_edgeR@assays$RNA@counts) counts <- counts[Matrix::rowSums(counts >= 1) >= 1, ] # subset the meta data fro filtered gene/cells metadata <- seurat_object_cellfor_edgeR@meta.data metadata <- metadata[,c("nCount_RNA", "cell_info")] metadata <- metadata[colnames(counts),] ### Make single cell experiment sce <- SingleCellExperiment(assays = counts, colData = metadata) ## convert to edgR object dge <- convertTo(sce, type="edgeR", assay.type = 1) meta_dge <- dge$samples meta_dge <- meta_dge[,c("lib.size","norm.factors")] meta_dge <- cbind(meta_dge, metadata) meta_dge$group <- factor(meta_dge$cell_info) meta_dge$group <- relevel(meta_dge$group, "KO") dge$samples <- meta_dge
dge <- calcNormFactors(dge) design <- model.matrix(~0+group, data=dge$samples) dge <- estimateDisp(dge, design = design) fit <- glmQLFit(dge, design = design)
Differential expression test
qlf_as.is <- glmQLFTest(fit) # top genes headqlf_as.is$table) logFC logCPM F PValue AL627309.1 -15.82462 8.083615 1355354.8 0 AL669831.5 -15.42607 8.111567 444435.6 0 LINC00115 -15.76849 8.087425 576319.8 0 FAM41C -15.66446 8.094247 501766.5 0 AL645608.3 -15.76752 8.087299 591389.5 0 AL645608.1 -15.83204 8.083359 1661128.7 0
and found to be all the genes are differentially regulated, no FDR and all p value is zero. So I explored the contrast for the testing and that gave me much more reasonable and expected results (but the FC is too low!)
my.contrasts <- makeContrasts(KO_vs_WT = groupKO-groupWT, levels=design) qlf.contrast <- glmQLFTest(fit, contrast=my.contrasts[,"KO_vs_WT"]) head(qlf.contrast$table) logFC logCPM F PValue AL627309.1 -0.02402289 8.083615 36.0058677 0.3631278 AL669831.5 -0.16083392 8.111567 6.8890345 0.1892858 LINC00115 0.02784262 8.087425 2.5886375 0.6827443 FAM41C -0.02251330 8.094247 0.4922016 0.8021113 AL645608.3 -0.02544709 8.087299 2.4978086 0.6936892 AL645608.1 0.03523809 8.083359 48.3970011 0.3426162
With this back ground my questions are
- Which one of this method is correct? If none of them, how to perform the differential testing?.
- Why the FDR is not present in any one of the model. Am I missing something?
- Can I include mitochondria read percentage as a covariant and is it require to include UMI count for the model?
Any help is highly appreciated!.