Question: Algorithm for clustering single cells based on SNPs?
0
gravatar for A248
4 weeks ago by
A2480
A2480 wrote:

We have single cell data which was pooled from 3 different organisms (same species) into one sample (processed using CellRanger and Seurat 3). We are now trying to de-segregate the cells based on organism by using SNP variation.

What would be the best algorithm for this?

Thanks!

ADD COMMENTlink modified 4 weeks ago by jared.andrews074.0k • written 4 weeks ago by A2480

Shouldn't it contain the batch information in the Seurat object? And 10X is 3' sequence, I am not sure whether you could analyze SNP on these data.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by shoujun.gu250
0
gravatar for jared.andrews07
4 weeks ago by
jared.andrews074.0k
St. Louis, MO
jared.andrews074.0k wrote:

There are just now starting to be some tools to sort of do stuff like this. The easiest I've found to use is vartrix, though it doesn't strictly do what you want. It will use a VCF file that you feed it and basically genotype each cell that has coverage for each variant.

Of course, many cells only have one or two (or zero) reads at a given variants, so you can really only say which cells you know have the variant allele - you can't say anything about whether they are hetero/homozygous/wildtype (anything that doesn't have the variant allele is just ambiguous).

It doesn't perform variant calling or clustering based on the SNPs, but it would help you define which cells came from which sample if you have mutation information on each of them.

ADD COMMENTlink written 4 weeks ago by jared.andrews074.0k

thanks! we don't have a VCF file for this particular species we are working with, which would be problem with vartrix.

We are only planning to focus on the genes which are highly expressed across >90% of cells (mitochondrial genes in particular), as we are not interested in identifying genome wide SNPs as a study, but rather finding just enough SNPs in these highly expressed genes that would be sufficient to use to de-segregate the pooled cells into their original organisms.

ADD REPLYlink written 29 days ago by A2480

I guess you could just pull all reads from those genes for each sample individually, then call variants with your favorite caller, but I'd expect that any basic filters would throw out almost everything you find due to low coverage/confidence. I certainly wouldn't trust them all that much.

Is there a particular reason you're trying to do this rather than just adding the sample names to the metadata before merging?

ADD REPLYlink written 29 days ago by jared.andrews074.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1179 users visited in the last hour