Hi everyone,
I am trying to identify tumor cells from an scRNA-seq dataset of cancer patients using copykat (https://github.com/navinlabcode/copykat). Following the manual`s recommendations, I performed the analysis sample by sample using the code below:
for (sample in samples) {
sample_obj <- subset(seurat_obj.filter, subset = orig.ident == sample)
count_mtx <- sample_obj@assays$RNA@counts
copykat_result <- copykat(
rawmat = count_mtx,
id.type = "S",
ngene.chr = 5,
win.size = 25,
KS.cut = 0.1,
sam.name = sample,
distance = "euclidean",
norm.cell.names = "",
output.seg = "FALSE",
plot.genes = "TRUE",
genome = "hg20",
n.cores = 1
)
save(copykat_result, file = paste("copykat_result.", dataset, "-", sample, ".Rdata", sep = ""))
}
However, when I checked the results, I noticed that a substantial proportion of cells (around 50% in one particular sample) from a normal sample were incorrectly classified as tumor cells. I suspect this might be due to suboptimal parameter settings, but I’m not sure how to adjust them effectively.
I would greatly appreciate any suggestions or advice on how to optimize the parameters or improve the analysis.
Thanks in advance!
Provide a vector of representative normal cells from each normal sample in the
norm.cell.names
option.