Question

Integrated Dimension Reduction Plot for CD4/CD8 sorted Feedback

0

Entering edit mode

2.8 years ago

tdfyoder ▴ 40

Hello,

I have recently followed adopted the Harvard Chan Bioinformatics Core guidelines for SC QC/Normalization/Clustering (https://hbctraining.github.io/scRNA-seq_online/schedule/links-to-lessons.html). I have integrated CD4+/CD8+ T cells from two time points.

I recently received feedback that my integrated dimension reduction plot clustering looked problematic. Specifically, the small clusters peripheral (splash/star?) and the number of distinct clusters.

Data was normalized using SCTransform, variables regressed were mitochondrial ratio and G2M-S phase score difference, as suggested for differentiating cell types. Alternative Workflow: https://satijalab.org/seurat/articles/cell_cycle_vignette.html

My clusters were called at 40 PC's w/ 0.6 resolution.

As for the number of clusters, TCR B VDJ subgenes were identified as strong conserved markers in several clusters. I wonder if it is worth excluding VDJ markers from analysis?

Any comment on the appearance of the dim plot and implications would be appreciated. Thank you!

enter image description here

10x seurat immunology • 1.6k views

ADD COMMENT • link updated 2.7 years ago by theHumanBorch ▴ 240 • written 2.8 years ago by tdfyoder ▴ 40

0

Entering edit mode

Is this an integrated dataset? Did you run the Seurat integration routine? Otherwise it is almost certain that much of the cluster separation is due to the batch effects between the samples and time points.

I recently received feedback that my integrated dimension reduction plot clustering looked problematic. Specifically, the small clusters peripheral (splash/star?) and the number of distinct clusters.

What does that mean, please elaborate?

ADD REPLY • link 2.8 years ago by ATpoint 81k

0

Entering edit mode

Thank you ATpoint. This is integrated data. I used SCtransform. Samples 1 and 2 were replicates of the same time point. I have included the code below.

split_seurat <- SplitObject(seurat_phase, split.by = "sample")

split_seurat <- split_seurat[c("samp1_rep1","samp2_rep2","samp3")]



for (i in 1:length(split_seurat)) {
   split_seurat[[i]] <- SCTransform(split_seurat[[i]], vars.to.regress = c("celldif","mitoRatio"))
 }

saveRDS(split_seurat,file= "split_seurat.rds")

##Second script###

split_seurat <- readRDS("split_seurat.rds")

integ_features <- SelectIntegrationFeatures(object.list = split_seurat,
                                            nfeatures = 3000)

# Prepare the SCT list object for integration
split_seurat <- PrepSCTIntegration(object.list = split_seurat,
                                   anchor.features = integ_features)

# Find best buddies - can take a while to run
integ_anchors <- FindIntegrationAnchors(object.list = split_seurat,
                                        normalization.method = "SCT",
                                        anchor.features = integ_features)

# Integrate across conditions
seurat_integrated <- IntegrateData(anchorset = integ_anchors,
                                   normalization.method = "SCT")

As far as the appearance of the plot. I am paraphrasing the feedback, since I was confused. I think the expectation one less delineation between fewer clusters, and less separation between clusters. Also, that cluster 7 has satellite clusters.

ADD REPLY • link 2.8 years ago by tdfyoder ▴ 40

score 4 · Accepted Answer · 2021-08-09

4

Entering edit mode

2.7 years ago

theHumanBorch ▴ 240

To me the sporadic clustering reminds me of using clonotype edit distance for dimensional reduction - I would consider removing the TCR genes not from the anchoring, but from the runUMAP() call. Below is an example of the problem I encountered trying to convert TCR edit disance into an assay for a Seurat Object.

enter image description here

You can do this with:

quietTCRgenes <- function(sc) {
    unwanted_genes <- "TRBV*|^TRBD*|^TRBJ*|^TRDV*|^TRDD*|^TRDJ*|^TRAV*|^TRAJ*|^TRGV*|^TRGJ*"
    if (inherits(x=sc, what ="Seurat")) {
        unwanted_genes <- grep(pattern = unwanted_genes, x = sc[["RNA"]]@var.features, value = T)
        sc[["RNA"]]@var.features <- sc[["RNA"]]@var.features[sc[["RNA"]]@var.features %!in% unwanted_genes]
    } else {
        #Biocondutor scran pipelines uses vector of variable genes for DR
        unwanted_genes <- grep(pattern = unwanted_genes, x = sc, value = T)
        sc <- sc[sc %!in% unwanted_genes]
    }
    return(sc)
}

seuratObj <- quietTCRgenes(seuratObj)

ADD COMMENT • link 2.7 years ago by theHumanBorch ▴ 240

1

Entering edit mode

Thank you! I also arrived at the at this conclusion. I ended up removing TCR before performing normalization/integration (code below). I performed this filtering upfront, I also found this paper- which removed VDJ genes at the find variable genes step (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6689255/pdf/nihms-1531727.pdf) Any comment on the best step to do this filtering would be appreciated. Thanks!

counts <- GetAssayData(filtered_seurat, assay = "RNA")

TCRv_list <- str_subset(rownames(counts), "TRAV|TRBV|TRGV")

counts <- counts[-(which(rownames(counts) %in% TCRv_list)),]
refiltered_seurat <- subset(filtered_seurat, features = rownames(counts))

enter image description here

ADD REPLY • link 2.7 years ago by tdfyoder ▴ 40

0

Entering edit mode

Oh interesting, so instead of just removing the genes from the variable gene list in Seurat, you just removed the V gene from your counts. It accomplishes the same goal and the UMAP you have looks good. I am wondering what the effect on the integration step would be - the V/D/J genes are generally heavily represented in the variable genes for integration for T single-cell data sets.

In my experience with BCR, I actually found the removal of VDJ genes from the variable list smoothed the UMAP, but did not prevent the clonal groups from clustering together. At the time , it made me think that there is a high degree of overlap in feature space between members with in a single clonotype. But I am not sure if there is much work on that in the field - the only one that comes to mind is CoNGA that is using both expression and clonotype for embedding.

ADD REPLY • link 2.7 years ago by theHumanBorch ▴ 240