Question: EdgeR for single cell differential expression analysis
0
gravatar for kvshamsudheen
22 days ago by
kvshamsudheen30 wrote:

I am trying to perform a more faster way of differential analysis between KO and WT single cell data. I have intergated the data using Seurat pipeline and the differential expression using DESeq2 Seurat implementation. As I have about 4k cells in each clusters, it is taking forever. Meanwhile I found the edgeR and make it working based on consolidating many biostar discussions and the edgeR manual as below

## subset Seurat object for the cluster     
seurat_object_cellfor_edgeR <- subset(seurat_object, subset = seurat_cluster == 0)

# count matrix 
counts <- as.matrix(seurat_object_cellfor_edgeR@assays$RNA@counts)
counts <- counts[Matrix::rowSums(counts >= 1) >= 1, ]
# subset the meta data fro filtered gene/cells
metadata <- seurat_object_cellfor_edgeR@meta.data
metadata <- metadata[,c("nCount_RNA", "cell_info")]
metadata <- metadata[colnames(counts),]

### Make single cell experiment
sce <- SingleCellExperiment(assays = counts, 
                                    colData = metadata)

## convert to edgR object
dge <- convertTo(sce, type="edgeR", assay.type = 1)
meta_dge <- dge$samples
meta_dge <- meta_dge[,c("lib.size","norm.factors")]
meta_dge <- cbind(meta_dge, metadata)
meta_dge$group <- factor(meta_dge$cell_info)
meta_dge$group <- relevel(meta_dge$group, "KO")
dge$samples <- meta_dge

model fit

dge <- calcNormFactors(dge)
design <- model.matrix(~0+group, data=dge$samples)
dge <- estimateDisp(dge, design = design)    
fit <- glmQLFit(dge, design = design)

Differential expression test

qlf_as.is <- glmQLFTest(fit)
# top genes
 headqlf_as.is$table)
               logFC   logCPM         F PValue
AL627309.1 -15.82462 8.083615 1355354.8      0
AL669831.5 -15.42607 8.111567  444435.6      0
LINC00115  -15.76849 8.087425  576319.8      0
FAM41C     -15.66446 8.094247  501766.5      0
AL645608.3 -15.76752 8.087299  591389.5      0
AL645608.1 -15.83204 8.083359 1661128.7      0

and found to be all the genes are differentially regulated, no FDR and all p value is zero. So I explored the contrast for the testing and that gave me much more reasonable and expected results (but the FC is too low!)

my.contrasts <- makeContrasts(KO_vs_WT = groupKO-groupWT, levels=design)
qlf.contrast <- glmQLFTest(fit, contrast=my.contrasts[,"KO_vs_WT"])
head(qlf.contrast$table)
                 logFC   logCPM          F    PValue
AL627309.1 -0.02402289 8.083615 36.0058677 0.3631278
AL669831.5 -0.16083392 8.111567  6.8890345 0.1892858
LINC00115   0.02784262 8.087425  2.5886375 0.6827443
FAM41C     -0.02251330 8.094247  0.4922016 0.8021113
AL645608.3 -0.02544709 8.087299  2.4978086 0.6936892
AL645608.1  0.03523809 8.083359 48.3970011 0.3426162

With this back ground my questions are

  1. Which one of this method is correct? If none of them, how to perform the differential testing?.
  2. Why the FDR is not present in any one of the model. Am I missing something?
  3. Can I include mitochondria read percentage as a covariant and is it require to include UMI count for the model?

Any help is highly appreciated!.

ADD COMMENTlink modified 20 days ago • written 22 days ago by kvshamsudheen30

Can you post a snippet of what your design looks like? and also my.contrasts. I think without knowing what your design/test is, it's hard to get an understanding.

For starts, the glmQLFTest requires you to specify either a coef or a contrast.

eg: glmQLFTest(fit, contrast=my.contrasts) (there's no need to subset the my.contrasts object)

I think this is the reason why no fdr/pvalue is given. edgeR is unsure what to test.

Final point, can you try topTags function to view the results instead of head? Not sure if it will make a difference but this is the method I always use (and is noted in the manual)

ADD REPLYlink modified 21 days ago • written 21 days ago by Amar620

Thanks. Yes, the topTags gives the FDR, but I am not sure why I am getting too low fold change

head(design)
                                          groupKO             groupWT
    AAACCCAAGCGTCTCG                      1                   0
    AAACCCACAAGTTCCA                      1                   0
    AAACCCACAAGTTCGT                      1                   0
    AAACCCACAGCACGAA                      1                   0
    AAACGAAAGACTGTTC                      1                   0
    AAACGAAAGGTCACTT                      1                   0
my.contrasts
                  Contrasts
Levels            KO_vs_WT
  groupWT          -1
  groupKO           1
ADD REPLYlink modified 20 days ago • written 21 days ago by kvshamsudheen30
1

Thank you. Your design and contrasts are fine and you should perform the glmQLFTest as described above. Regarding your question about FC, not 100% sure sorry it's outside of my expertise. You should read these to get a better understanding: https://support.bioconductor.org/p/108762/ and https://www.rdocumentation.org/packages/edgeR/versions/3.14.0/topics/predFC

I take the logFC values produced by edgeR at face value, that is if you average the counts and take the difference between groups (and factor in lib differences). Of course as noted in by Prof Smyth, it's much more complicated than simply adding and dividing. In other words, you have small logFC values because the expression differences between those genes is small. Don't get fixated on the logFC values of genes. You should consider the overall "picture" of the expression profile between your groups. A small fold change can still have a remarkable effect on the biological system you're studying.

ADD REPLYlink modified 20 days ago • written 20 days ago by Amar620
3
gravatar for kvshamsudheen
20 days ago by
kvshamsudheen30 wrote:

For those who are still looking for the right answer, can have a look at these scripts used for the Soneson and Robinson 2018, Nat. Meth: PMID: 29481549 paper for comparing different DE methods for single cell RNA data set. Both of them are a great start for the same. Cheers!.

ADD COMMENTlink written 20 days ago by kvshamsudheen30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1663 users visited in the last hour