Question: Algorithm for clustering single cells based on SNPs?
gravatar for A248
13 months ago by
A24810 wrote:

We have single cell data which was pooled from 3 different organisms (same species) into one sample (processed using CellRanger and Seurat 3). We are now trying to de-segregate the cells based on organism by using SNP variation.

What would be the best algorithm for this?


ADD COMMENTlink modified 13 months ago by jared.andrews078.0k • written 13 months ago by A24810

Shouldn't it contain the batch information in the Seurat object? And 10X is 3' sequence, I am not sure whether you could analyze SNP on these data.

ADD REPLYlink modified 13 months ago • written 13 months ago by shoujun.gu310
gravatar for jared.andrews07
13 months ago by
Memphis, TN
jared.andrews078.0k wrote:

There are just now starting to be some tools to sort of do stuff like this. The easiest I've found to use is vartrix, though it doesn't strictly do what you want. It will use a VCF file that you feed it and basically genotype each cell that has coverage for each variant.

Of course, many cells only have one or two (or zero) reads at a given variants, so you can really only say which cells you know have the variant allele - you can't say anything about whether they are hetero/homozygous/wildtype (anything that doesn't have the variant allele is just ambiguous).

It doesn't perform variant calling or clustering based on the SNPs, but it would help you define which cells came from which sample if you have mutation information on each of them.

ADD COMMENTlink written 13 months ago by jared.andrews078.0k

thanks! we don't have a VCF file for this particular species we are working with, which would be problem with vartrix.

We are only planning to focus on the genes which are highly expressed across >90% of cells (mitochondrial genes in particular), as we are not interested in identifying genome wide SNPs as a study, but rather finding just enough SNPs in these highly expressed genes that would be sufficient to use to de-segregate the pooled cells into their original organisms.

ADD REPLYlink written 13 months ago by A24810

I guess you could just pull all reads from those genes for each sample individually, then call variants with your favorite caller, but I'd expect that any basic filters would throw out almost everything you find due to low coverage/confidence. I certainly wouldn't trust them all that much.

Is there a particular reason you're trying to do this rather than just adding the sample names to the metadata before merging?

ADD REPLYlink written 13 months ago by jared.andrews078.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1133 users visited in the last hour