Question

How to adjust for homotypic doublets in the R package DoubletFinder

1

Entering edit mode

2.8 years ago

Stevens ▴ 20

Hi, I have 6 single cell RNA-seq datasets that I'm running through the Seurat pipeline to identify differentially expressed genes across the two conditions. So, I want to remove doublets from the data, and one of the best tools for that is suggested to be DoubletFinder.

Part of the "tutorial" for doublet finder from their github page (https://github.com/chris-mcginnis-ucsf/DoubletFinder) is to adjust for homotypic doublets as the package has difficulty identifying them, adjustments prevents some false positive doublets being identified.

Thats performed in the code:

homotypic.prop <- modelHomotypic(annotations)           ## ex: annotations <- seu_kidney@meta.data$ClusteringResults
nExp_poi <- round(0.075*nrow(seu_kidney@meta.data))  ## Assuming 7.5% doublet formation rate - tailor for your dataset
nExp_poi.adj <- round(nExp_poi*(1-homotypic.prop))

The problem is they don't explain how to get the ClusteringResults meta.data so I'm not sure how to perform that correction.

Any help in understanding how to produce that information is appreciated as well as any other information on the package as tutorials on it are hard to find. The only tutorial I have found is - https://nbisweden.github.io/workshop-scRNAseq/labs/compiled/seurat/seurat_01_qc.html#Predict_doublets , but they don't perform this correction.

doubletfinder seurat doublets • 2.9k views

ADD COMMENT • link updated 2.5 years ago by pajucon • 0 • written 2.8 years ago by Stevens ▴ 20

score 0 · Answer 1 · 2021-11-05

The meta.data$ClusteringResults are derived from a model data set with cell type annotations present. The authors suggest using a dataset similar to your own as an estimate for the proportion of cell types in your data set. Since these are rarely available, you can proceed without this information, perform classification, and then use your classification results to inform this estimate.

Luckily, homotypic doublets may not be a problem depending on the type of analysis you are performing. If you have some doublets of the same type and their counts are normalized, they will generally represent the profile of single cells of the same type.

For a sanity check, try a random value and see how great the effect is on downstream analyses. Using the pbmc3k data: homotypic.prop = 0.173 nExp_poi = 198 nExp_poi_adj = 164