Hi,
I am doing QC on the single-cell RNAseq data. I have 3 samples: tumor, normal, adjacent. All data shows strong right skewed distribution of # genes and # of cells or most cells have fewer genes. So right now, I only filtered out cells > 10 % mitochondrial percent and cell < 200 genes. I wonder should I increase the lower cutoff to filter out more cells, or should we have a higher cutoff to remove any multiplets (I do not if I need to do since it has tumor cells). I am using the seurat
pipeline.
I do not know how to post images here, but I really appreciate if you can provide me any suggestion or guidelines or any paper to refer.
Thank you in advance.
Can you please add some plots to illustrate the problems? Anecdotal descriptions are hard to follow. For QC I personally check for four parameters:
This is run through a median absolute deviation calculation and cells with values beyond 3x MAD in either direction from the median are removed for the particilar QC metric. For total UMI I even do 2xMAD since I think regardless of cellular heterogeneity UMI counts should be fairly similar, everything else is either a low-quality or doublet cell. Generally it makes sense to plot data as violins and then look at it by eye to decide if the MAD automated cutoffs make sense. Looking at the data in both normal and log scale can help to decide on upper/lower cutoffs. You eventually want to remove outliers from the bulk. If you end up with clusters that look suspicious in terms of expressing unusual combinations of marker genes you might want to run an explicit doublet detection tool, they might be an artifcat cluster of doublets that survived the initial QC.
Seurat is ok probably, I personally like the Bioconductor workflow, documentation is just so much better for me: http://bioconductor.org/books/release/OSCA/