How to QC single-cell RNAseq data for tumor samples if data is right skewed?
0
2
Entering edit mode
7 months ago

Hi,

I am doing QC on the single-cell RNAseq data. I have 3 samples: tumor, normal, adjacent. All data shows strong right skewed distribution of # genes and # of cells or most cells have fewer genes. So right now, I only filtered out cells > 10 % mitochondrial percent and cell < 200 genes. I wonder should I increase the lower cutoff to filter out more cells, or should we have a higher cutoff to remove any multiplets (I do not if I need to do since it has tumor cells). I am using the seurat pipeline.

I do not know how to post images here, but I really appreciate if you can provide me any suggestion or guidelines or any paper to refer.

Thank you in advance.

scRNA QC distribution single-cell • 383 views
1
Entering edit mode

Can you please add some plots to illustrate the problems? Anecdotal descriptions are hard to follow. For QC I personally check for four parameters:

1. % of reads mapping to mitochondrial genes (usually 5 to 10% is a cutoff here)
2. % reads mapping to rRNA genes (a good sample usually has < 1%)
3. total number of detected genes (so a gene with > 0 counts) (that is probably celltype-specific, usually somewhat 3k-7k genes)
4. total UMI count per cell (depends heavily on the setup, just look at the QC violin plots for a good cutoff to separate bulk from outliers).

This is run through a median absolute deviation calculation and cells with values beyond 3x MAD in either direction from the median are removed for the particilar QC metric. For total UMI I even do 2xMAD since I think regardless of cellular heterogeneity UMI counts should be fairly similar, everything else is either a low-quality or doublet cell. Generally it makes sense to plot data as violins and then look at it by eye to decide if the MAD automated cutoffs make sense. Looking at the data in both normal and log scale can help to decide on upper/lower cutoffs. You eventually want to remove outliers from the bulk. If you end up with clusters that look suspicious in terms of expressing unusual combinations of marker genes you might want to run an explicit doublet detection tool, they might be an artifcat cluster of doublets that survived the initial QC.

Seurat is ok probably, I personally like the Bioconductor workflow, documentation is just so much better for me: http://bioconductor.org/books/release/OSCA/