Inferring cell identity/genotype in single cells with missing information
0
0
Entering edit mode
19 days ago
txema.heredia ▴ 110

Hi,

I have been asked to analyze a single cell dataset as follows:

  • Mouse; WT vs Fib4 mutant ; 4 samples (2+2), 1xM+1xF in each genotype.

  • 2 sequencing runs with 10x, superloading the runs with 2 biological samples each using a hashtag antibody:

    • run #1:
      • sample 1: Male; Fib4 mutant
      • sample 2: Female ; WT
    • run #2:
      • sample 3: Female; Fib4 mutant
      • sample 4: Male ; WT

Unfortunately, the antibody reaction didn't work well after sequencing (it looked fine in the wet lab pre-sequencing, though). I've been left with 2 "samples" (12k + 6k cells) that combine cells of both sexes and both phenotypes. I am trying to salvage what I can from the analysis.

As each run was composed of one Male and one Female sample, I used the expression level of sex-specific genes (F: Xist vs M: Ddx3y, Eif2s3y, Kdm5d, Uty) to classify cells into each sample-of-origin. I used the raw counts of both groups of genes and classified them into a sex if they had >0 reads. This resulted in 54% sex-classified cells for run #1, and 69% for run #2

run both F M none
#1 130 2434 4187 5387
#2 95 3273 954 1796

Knowing this (VERY IMPERFECT) classification, I was able to assign a genotype to those cells.

From there, I merged both sequencing runs into a single Seurat object with 18k cells classified by genotype:

Fib4 WT NA
7460 3388 7408

The Fib4 mutants are a mouse model of frailty. Because of this, either the tissue/cell composition of the original samples, or the ability of cells to survive tissue dissociation is different between genotypes. After a first round of naive clustering, I can see clear differences in the abundance of WT/Fib4 cells on several clusters

tSNE

And there are several clusters dominated by cells with no assigned genotype

table clusters

Because of this, I am trying to find some way to classify (as many as possible of) the NA cells into one of the two genotypes. What I have tried up to now is:

  • Select the largest cluster with the highest number of both WT and Fib4 cells (cluster 0).
  • Run FindMarkers on the cluster to detect markers that can distinguish the two genotypes.
  • Use the top up and down markers to create a WT.score and Fib4.score and run AddModuleScore with those gene lists on the whole dataset.
  • Classify cells according to those 2 scores.
ss<-subset(seu,subset = seurat_clusters == 0)
Idents(ss) <- "gt"
# gt_cl0_markers <- FindMarkers(ss, ident.1 = "Fib4", ident.2 = "WT" )
gt_cl0_markers <- FindMarkers(ss, ident.1 = "Fib4", ident.2 = "WT", logfc.threshold = 0.25, test.use = "roc", only.pos = F)


gt_cl0_up <- rownames( gt_cl0_markers[gt_cl0_markers$avg_log2FC > 0 ,] %>% top_n(5, power) ) gt_cl0_down <- rownames( gt_cl0_markers[gt_cl0_markers$avg_log2FC < 0 ,] %>% top_n(5, power) )
gt_cl0_markers$dir <- ifelse(gt_cl0_markers$avg_log2FC >= 0, "up", "down")

ss <- AddModuleScore(ss, features=list(gt_cl0_up), name="seu_fib4_cl0_up", assay="RNA", slot="data") ss <- AddModuleScore(ss, features=list(gt_cl0_down), name="seu_fib4_cl0_down", assay="RNA", slot="data")

md<-ss@meta.data

ggplot(md, aes(x=seu_fib4_cl0_up1, y=seu_fib4_cl0_down1)) + geom_abline(slope=1,linetype="dashed")+ geom_hline(yintercept = 0,linetype="dashed")+ geom_vline(xintercept = 0,linetype="dashed")+ geom_point(alpha=0.25, aes(color=gt)) + facet_wrap(~gt)+ theme_minimal() + theme(aspect.ratio = 1) + ggtitle("Genotype") + guides(color=guide_legend(override.aes = list(alpha=1)))

Unfortunately, these scores created from the cluster/genotype markers doesn't seem able to classify much:

scores all cells

And they don't even classify much when applied only on the very same cluster used to find the markers:

scores cluster 0

I have tried this using both the default method to FindMarkers and the test.use = "roc" one. Using the top 5, 20, and 50 markers in each direction.

Is this the right way to infer the genotype/grouping of cells with missing information? Am I doing something wrong? How should I classify cells based on these differentially expressed genes? Am I doing everything fine but I am out of luck with these samples?

Thanks, Txema

cell seurat single score • 124 views
ADD COMMENT

Login before adding your answer.

Traffic: 2006 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6