Going from Seurat object to DESeq2 analysis
1
0
Entering edit mode
2.7 years ago
learner-MD ▴ 30

Hi all,

I'm still a novice R user, but with the use of Seurat vignettes, Harvard Chan Bioinformatics tutorials, and OSCA, I've been able to learn quite a bit about single-cell analysis and can reasonably work my way from raw fastq files to cluster annotation. However, I'm struggling with going from a merged Seurat object (containing replicates from two conditions) to generating a DESeq2 object to perform DEG analysis across conditions (between specific clusters). I've been going through the following tutorial:

https://github.com/hbctraining/scRNA-seq_online/blob/master/lessons/pseudobulk_DESeq2_scrnaseq.md

It's been excellent, but I don't think my understanding of base R is good enough to understand every step. I'm teaching myself more at this time, but is there a simpler way to get from a Seurat object to DESeq2 analysis? Or any example scripts anyone can provide? If so, I'd greatly appreciate it!

Part of the reason I want to do it this way rather than the FindMarkers() function is for processing time:

https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#recommendations-for-single-cell-analysis

Thanks in advance!

DEG single-cell DESeq2 scRNAseq Seurat • 5.3k views
ADD COMMENT
0
Entering edit mode

Do you want to run a pseudobulk analysis or treat each single cell as a replicate?

ADD REPLY
0
Entering edit mode

Pseudobulk analysis. For example, I have 10 samples - 5 control and 5 disease, with approx. 5000 to 8000 cells per sample.

Everything I've learned/read about single-cell analysis so far says that we should not be treating each cell as a replicate.

ADD REPLY
1
Entering edit mode
2.7 years ago
ATpoint 82k

I agree that in my experience and from what I read from others pseudobulk appears to be more robust as summing cells eliminates sparseness of data and therefore (I think) makes fold change estimations more robust/meaningful. I use scuttle for the aggregation. Here is an example to get familiar which starts from a SingleCellExperiment. There is a conversion vignette to convert Seurat to SingleCellExperiment which is probably the easiest here.

library(scuttle)

#/ example data and some mock clusters:
set.seed(123)
sce <- mockSCE()
sce$Cluster <- sample(LETTERS[1:5], ncol(sce), replace=TRUE)

#/ we have several clusters and two treatments
colData(sce)

DataFrame with 200 rows and 4 columns
         Mutation_Status  Cell_Cycle   Treatment     Cluster
             <character> <character> <character> <character>
Cell_001        negative           S      treat1           A
Cell_002        positive           S      treat1           D
Cell_003        negative          G0      treat2           D
Cell_004        negative          G1      treat2           A
Cell_005        positive         G2M      treat1           D
...                  ...         ...         ...         ...
Cell_196        positive          G0      treat2           D
Cell_197        negative         G2M      treat1           E
Cell_198        negative           S      treat1           C
Cell_199        negative         G2M      treat1           A
Cell_200        negative          G1      treat1           B

#/ aggregate by cluster and treatment:
sum_by <- 
  c("Cluster", "Treatment")

summed <- 
  aggregateAcrossCells(sce, id=colData(sce)[,sum_by])

#/ add rownames using the information from the colData:
colnames(summed) <- 
  apply(colData(summed)[,sum_by], 1, function(x) paste(x, collapse="_"))

head(assay(summed, "counts"))

          A_treat1 A_treat2 B_treat1 B_treat2 C_treat1 C_treat2 D_treat1 D_treat2
Gene_0001      218      695      667      384      383      512      809      307
Gene_0002     7029     7228     7088     6638     4355     7605     8013     5978
Gene_0003      251      354      254       68      209      612      389      106
Gene_0004     7586     5130     4950     3904     2640     4528    11968    10401
Gene_0005    19564    13940    10152    17015    10612    15363    18375    19609
Gene_0006     6024     3261     6874     4347     1411     3848     4875     2932
          E_treat1 E_treat2
Gene_0001      319      803
Gene_0002     7151     7435
Gene_0003      168      301
Gene_0004     7263     7271
Gene_0005    14327    10150
Gene_0006     6739     4977

This is now a SummarizedExperiment with the raw pseudobulk counts that can go into DESeq2. For a matrix of raw counts rather than the SummarizedExperiments use:

raw <- assay(summed, "counts")
ADD COMMENT

Login before adding your answer.

Traffic: 1957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6