The number of unique genes detected in each cell and The total number of molecules detected within a cell
They then refer to them as nCount_RNA and nFeature_RNA, but I'm not sure which is which. So my question is:
1.) What are the nCount_RNA and what are the nFeature_FNA
2.) Later in the pipeline, when you're normalizing the data, it says they "normalizes the feature expression measurements for each cell by the total expression." Can anybody explain that?
nFeature_RNA is the number of genes detected in each cell. nCount_RNA is the total number of molecules detected within a cell. Low nFeature_RNA for a cell indicates that it may be dead/dying or an empty droplet. High nCount_RNA and/or nFeature_RNA indicates that the "cell" may in fact be a doublet (or multiplet). In combination with %mitochondrial reads, removing outliers from these groups removes most doublets/dead cells/empty droplets, hence why filtering is a common pre-processing step.
The NormalizeData step is basically just ensuring expression values across cells are on a comparable scale. By default, it will divide counts for each gene by the total counts in the cell, multiply that value for each gene by the scale.factor (10,000 by default), and then natural log-transform them.
I stumbled upon this question and had a follow-up question.
I am re-analysing a single-cell RNA-seq dataset with two samples (plus minus treatment) and have downloaded preprocessed data from the geodataset as two .csv files. The authors state these files contain matrices that have been QC and logNormalized - and scaled.
After creating a Seurat object for both datasets, I checked the nFeatures_RNA and nCount_RNA for either dataset and got around twice as many nFeatures as nCounts_RNA. I can't explain this.
To me UMIs are the nCount_RNA and I can't find anything on the internet proving otherwise.
If nCount_RNA is UMIs, and there are only half the UMIs as genes detected, how have the genes been detected? I believe that you can't have two RNA molecules from different genes detected by the same UMI.
I attach a plot of the nCount_RNA against nFeatures_RNA and hope someone with a kind heart can clarify my question. If it helps these cells should be endothelial cells.
How is nCount_RNA different from library size? Thanks!