Question

Differentially Expressed Genes between two conditions (scRNA, single GEO dataset with multiple samples and no cell annotations)

0

Entering edit mode

7 months ago

prietto • 0

Hi guys,

My aim is to identify Differentially expressed genes from GEO183837. This dataset contains Single-cell transcriptome profiling of the human endometrium from patients with recurrent implantation failure, specifically 3 control samples and 6 RIF (diseased) samples.

I read all samples into seurat objects, merged all 9 samples. I then proceeded to do Quality filtering based of mitochondrial RNA percentage counts, and normalized the merged data, ran a PCA, and a UMAP. I checked for batch variation and observed technical variation across the 9 samples. I fixed this by splitting the data by Patients, and Normalizing, finding variable features before integrating the data by the FindIntegrationAnchors function. I again did a UMAP and the batch effects were gone.

Now, I have a integrated Seurat dataset containing 9 samples. I have annotated my metadata such that I have a Type column with dis and ctrl (for diseased and control) as possible entries. I also have a Patient column with pat1, pat2...pat9 as entries.

My aim is to find Differentially expressed genes between Healthy and Diseased (2 conditions). I don't have cell annotations, so the approach given in the documentation example is hard to follow for me. Here is what I was thinking might work:

seurat.integrated <- SetIdent(seurat.integrated, value=seurat.integrated@meta.data$Type)
Differentially_expressed<-FindMarkers(seurat.integrated, ident.1="ctrl",ident.2 = "dis",
                                      logfc.threshold = 0.25, test.use = "wilcox")

Logic: Directly brute force the FindMarkers function to find differentially expressed genes across diseased and control conditions. Is this the right approach, or what else can I do to get the right genes?

Best,
Prietto

scRNA-seq RNA-Seq Seurat • 1.0k views

ADD COMMENT • link updated 7 months ago by bk11 ★ 2.4k • written 7 months ago by prietto • 0

1

Entering edit mode

You can simply do Idents(seurat.integrated) <- "Type" instead of seurat.integrated <- SetIdent(seurat.integrated, value=seurat.integrated@meta.data$Type). Using FindMarkers function to find differentially expressed genes across diseased and control conditions is alright but pseudobulk approach is highly recommended to find DE genes. See this and this.

ADD REPLY • link 7 months ago by bk11 ★ 2.4k

0

Entering edit mode

Thank you. I tried doing a pseudobulk analysis by doing the following:

I have 9 patient ids in a column Patient in my metadata, and a condition (dis or ctrl) in a Type column in my metadata.

I ran the below code to get an AggregateExpressionMatrix

cts <-AggregateExpression(seurat.integrated,
                group.by=c("Patient", "Type"),
                assays='RNA',
                slot="counts",
                return.seurat = FALSE)

Next, I manipulated the resultant matrix to a format compatible with DESeq2 by using the below code:

colData <- data.frame(samples=colnames((cts)))
rownames(colData) <- NULL
colData <- colData%>%
mutate(condition=ifelse(grepl('ctrl', samples), 'Control', 'Diseased'))%>%
column_to_rownames(var='samples')

This gave me a matrix cts with 9 patient ids as column names and all genes as row labels. I also generated a colData matrix with patient ids as rows and condition as column. Now, I fed this data to a DESeq object.

dds <- DESeqDataSetFromMatrix(countData=cts,
                   colData=colData,
                   design= ~condition)
keep <-rowSums(counts(dds)) >=10
dds<-dds[keep,]
dds <- DESeq(dds)
resultsNames(dds)
res <- results(dds, name = "condition_Diseased_vs_Control")
EnhancedVolcano(res,
            lab = rownames(res),
            x = 'log2FoldChange',
            y = 'pvalue')

I filtered the genes using a adjusted pvalue cutoff of 0.05, but only a single gene is being reported. I think I must have messed up the analysis somewhere, but am confident about the processing and integration of the dataset. Where have I gone wrong/what should I do differently?

Best, Prietto

ADD REPLY • link 7 months ago by prietto • 0

0

Entering edit mode

Your code keep <- rowSums(counts(dds)) >=10 is not correct, please change it to-

keep <- rowSums( counts(dds) >= A ) >= B
dds <- dds[keep,]

This requires genes to have B or more samples with counts of A or more. It therefore filters out genes that have less than B samples with counts of A or more.

ADD REPLY • link 7 months ago by bk11 ★ 2.4k