Hi, I am new to bioinformatics and I have a few silly questions about how to define thresholds for QC metrics in Seurat.
I noticed that Seurat filters cells that have unique feature counts over 2,500 or less than 200. And I have heard that we need to apply a different QC values for each dataset, by looking at the distribution of the number of genes and the distribution of UMIs.
Then, I guess, applying a different threshold for the maximum number of genes to each data means that "the high gene count" is defined relatively for each data. For example,
nFeature_RNA < 2000 is a threshold for the max value in A dataset, but
nFeature_RNA < 3000 is a threshold for the max value in B dataset (i.e 2000 is not considered as high in B data).
But, I don't understand why we need to set a different upper limit for the number of genes in each data. Is something biologically wrong with the high gene count?
As a non-biology major, I don't understand why we change the upper limit for nFeature (genes) and nCount (UMIs) for each data.
I know this is pretty silly.. but if anyone can explain about the biological concept behind the numbers, I would really appreciate it. Thank you very much.