Question

Going from Seurat object to DESeq2 analysis

0

Entering edit mode

2.7 years ago

learner-MD ▴ 30

Hi all,

I'm still a novice R user, but with the use of Seurat vignettes, Harvard Chan Bioinformatics tutorials, and OSCA, I've been able to learn quite a bit about single-cell analysis and can reasonably work my way from raw fastq files to cluster annotation. However, I'm struggling with going from a merged Seurat object (containing replicates from two conditions) to generating a DESeq2 object to perform DEG analysis across conditions (between specific clusters). I've been going through the following tutorial:

https://github.com/hbctraining/scRNA-seq_online/blob/master/lessons/pseudobulk_DESeq2_scrnaseq.md

It's been excellent, but I don't think my understanding of base R is good enough to understand every step. I'm teaching myself more at this time, but is there a simpler way to get from a Seurat object to DESeq2 analysis? Or any example scripts anyone can provide? If so, I'd greatly appreciate it!

Part of the reason I want to do it this way rather than the FindMarkers() function is for processing time:

https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#recommendations-for-single-cell-analysis

Thanks in advance!

DEG single-cell DESeq2 scRNAseq Seurat • 5.3k views

ADD COMMENT • link updated 2.7 years ago by benformatics 3.9k • written 2.7 years ago by learner-MD ▴ 30

0

Entering edit mode

Do you want to run a pseudobulk analysis or treat each single cell as a replicate?

ADD REPLY • link 2.7 years ago by ATpoint 82k

0

Entering edit mode

Pseudobulk analysis. For example, I have 10 samples - 5 control and 5 disease, with approx. 5000 to 8000 cells per sample.

Everything I've learned/read about single-cell analysis so far says that we should not be treating each cell as a replicate.

ADD REPLY • link 2.7 years ago by learner-MD ▴ 30

score 1 · Answer 1 · 2021-07-30

I agree that in my experience and from what I read from others pseudobulk appears to be more robust as summing cells eliminates sparseness of data and therefore (I think) makes fold change estimations more robust/meaningful. I use scuttle for the aggregation. Here is an example to get familiar which starts from a SingleCellExperiment. There is a conversion vignette to convert Seurat to SingleCellExperiment which is probably the easiest here.

library(scuttle)

#/ example data and some mock clusters:
set.seed(123)
sce <- mockSCE()
sce$Cluster <- sample(LETTERS[1:5], ncol(sce), replace=TRUE)

#/ we have several clusters and two treatments
colData(sce)

DataFrame with 200 rows and 4 columns
         Mutation_Status  Cell_Cycle   Treatment     Cluster
             <character> <character> <character> <character>
Cell_001        negative           S      treat1           A
Cell_002        positive           S      treat1           D
Cell_003        negative          G0      treat2           D
Cell_004        negative          G1      treat2           A
Cell_005        positive         G2M      treat1           D
...                  ...         ...         ...         ...
Cell_196        positive          G0      treat2           D
Cell_197        negative         G2M      treat1           E
Cell_198        negative           S      treat1           C
Cell_199        negative         G2M      treat1           A
Cell_200        negative          G1      treat1           B

#/ aggregate by cluster and treatment:
sum_by <- 
  c("Cluster", "Treatment")

summed <- 
  aggregateAcrossCells(sce, id=colData(sce)[,sum_by])

#/ add rownames using the information from the colData:
colnames(summed) <- 
  apply(colData(summed)[,sum_by], 1, function(x) paste(x, collapse="_"))

head(assay(summed, "counts"))

          A_treat1 A_treat2 B_treat1 B_treat2 C_treat1 C_treat2 D_treat1 D_treat2
Gene_0001      218      695      667      384      383      512      809      307
Gene_0002     7029     7228     7088     6638     4355     7605     8013     5978
Gene_0003      251      354      254       68      209      612      389      106
Gene_0004     7586     5130     4950     3904     2640     4528    11968    10401
Gene_0005    19564    13940    10152    17015    10612    15363    18375    19609
Gene_0006     6024     3261     6874     4347     1411     3848     4875     2932
          E_treat1 E_treat2
Gene_0001      319      803
Gene_0002     7151     7435
Gene_0003      168      301
Gene_0004     7263     7271
Gene_0005    14327    10150
Gene_0006     6739     4977

This is now a SummarizedExperiment with the raw pseudobulk counts that can go into DESeq2. For a matrix of raw counts rather than the SummarizedExperiments use:

raw <- assay(summed, "counts")