I am re-analysing a publicly available single-cell RNA-seq dataset with two samples (plus minus treatment) and have downloaded preprocessed data from the geodataset as two .csv files. The authors state these files contain matrices that have been QC and logNormalized - and scaled.
After creating a Seurat object for both datasets, I checked the nFeatures_RNA and nCount_RNA for either dataset and got around twice as many nFeatures as nCounts_RNA. I can't explain this. To me UMIs are the nCount_RNA and I can't find anything on the internet proving otherwise. If nCount_RNA is UMIs, and there are only half the UMIs as genes detected, how can that many genes been detected? I believe that you can't have two RNA molecules from different genes detected by the same UMI. In other questions online, I have seen the definition of cell complexity log10(nFeature_RNA/nCount_RNA) is >0.8. Maybe it is my mathematical understanding that is failing me.
I attach a plot of the nCount_RNA against nFeatures_RNA and hope someone with a kind heart can explain how nFeature_RNA can be 2x that of nCount_RNA for a given cell. If it helps these cells should be endothelial cells from tumors.
Thank you in advance. /Maibritt