Identifying tumor cells in single cell RNAseq data
1
1
Entering edit mode
2.6 years ago
jrleary ▴ 190

My lab does a lot of single cell RNAseq of samples that include tumor cells. I have a pipeline in place that reduces dimension, clusters cells, and automatically assigns cell type labels to those clusters (mostly using Seurat and SingleR). However, I do not currently have a way to differentiate tumor cells from normal cells, which would allow me to perform other downstream analyses such as copy number variation using inferCNV. Are there any tools in R/Python/Bash etc. that would allow me to differentiate between those two cell types, or is doing so manually using biomarkers the best option? I'd like for the process to be as objective / reproducible as possible.

cancer scRNAseq R • 2.2k views
2
Entering edit mode

It will depend on your tumor type. Some are easier to differentiate than others.

You don't necessarily need to identify tumor cells to call CNVs. In fact, you could call CNVs to identify tumor cells.

0
Entering edit mode

I think I'll be giving CONICSmat a go as a method of identifying CNVs / tumor cells. Thanks for the input.

0
Entering edit mode

If you are interested, there are a few other alternatives discussed here: Detecting copy number alterations based on RNA-seq data

0
Entering edit mode

That thread of yours is where I found CONICSmat in the first place :) With regards to using CNVs to identify tumor cells, are there any best-practices documents / guides floating around? I come from a pure stats background so I'm less familiar with some of the (to me) more complicated biology concepts.

2
Entering edit mode

Usually, only tumor cells should have copy number abnormalities. In panel B below (from Patel et al), you can see the topmost cluster has a flat copy number profile and contains the normal cells.

0
Entering edit mode

This is generally true, but does depend somewhat on the tumor type. Certain leukemias have "progenitor" or "poised" populations that may still harbor significant genetic variation despite not being truly malignant. This is where your biological expertise is going to have to come into play.

0
Entering edit mode

Thank you both. I had been using the Patel paper as a reference but it looks like I'm going to have to do a much deeper dive research-wise before I start analyzing anything. I don't want to be lacking in domain knowledge.

1
Entering edit mode

It really helps if you know what to look for. If you have any clinical karyotype data, it can make your life a lot easier. scRNA CNV calling is coarse - you aren't going to pick up many focal changes (< 1MB). If you have a clinical collaborator that provided you the samples, bug them to give you any information they might have available. If your cancer of interest has very recurrent copy number alterations, that can also help, but there are always variations. Speaking from experience, prior information makes the process much, much easier.

0
Entering edit mode

I'll see what I can do, but I believe at the moment we only have scRNA data, maybe some paired bulk RNAseq data. With those resources, do you think trying to estimate CNVs is worth the time or would it be too noisy?

0
Entering edit mode

Oh, it can totally be valuable. I'm just not sure it's the best tool to differentiate malignant and normal cells, but again, that's highly cancer-type dependent.

It's also not terribly difficult to do, so I'd say the upside is strong - just trying to make sure you're aware of some of the caveats.

0
Entering edit mode

The data we're analyzing is from PDAC, so I'll be doing some pancreas-specific research. Are there any other extant computational methods you'd recommend for differentiating between malignant and normal?

1
Entering edit mode

I'm not familiar with that cancer type, so I'm afraid you're on your own there. The suggestions in my answer might be helpful, but I don't know enough about the data/cancer to say which is your best bet.

2
Entering edit mode

Just an update for future readers: I've had decent success replicating CNV analyses with CONICSmat on publicly available PDAC scRNA-seq data. Obviously processing, filtering, normalization, etc. methods are going to differ between labs but I've been able to see the strongest amplifications and deletions fairly clearly after my analysis.

3
Entering edit mode
2.6 years ago

As igor said, it really depends on the type. For immune cells, the easiest way to do this is usually to perform single cell VDJ sequencing on the same cells, which yields clonality information (most immune cancers are highly clonal).

If your cancer has a heavy genetic component, you can try utilize mutation information if you know the malignant populations harbor a given mutation (like if you performed bulk exome-seq or WGS). vartrix is a pretty easy tool to use for this. You can then identify malignant clusters pretty easily by enrichment of the mutation.

Lastly, if you've used SingleR, you have a pretty good idea of which clusters contain which cell types. Pick a cluster of cells that shouldn't be malignant to use as your controls.