16 months ago by
Institute for Cancer Research, Dept. of Tumor Biology, Oslo University Hospital, Oslo, Norway
I found that changing x.low.cutoff between 0.0125 (in the PBMC 3k tutorial) and 0.1, for example, will have a huge effect on the number of variable genes
@igor, this is bound to happen as the number of genes is higher below 0 ( as can be seen by the dense plotting)
I was facing the same problem and on googling, I found this
Cutoff to find out number of variable genes? https://github.com/satijalab/seurat/issues/634
"All methods for HVG selection have some cutoff parameters, and unfortunately, criteria for 'optimality' are difficult to identify.
If you have UMI data, we suggest identifying HVG on the basis of variance-to-mean ratio, as we demonstrate here: https://satijalab.org/seurat/mca.html
Again, with UMI datasets - we typically do not notice large differences in the analysis depending on the exact number of genes selected- ranging from 2k genes to even the full transcriptome."
and if we go the link mentioned in the discussion, we will get this
We perform standard log-normalization.
mca <- NormalizeData(object = mca, normalization.method = "LogNormalize", scale.factor = 10000)
FindVariableGenes calculates the variance and mean for each gene in the dataset in the dataset (storing this in firstname.lastname@example.org), and sorts genes by their variance/mean ratio (VMR). We have observed that for large-cell datasets with unique molecular identifiers, selecting highly variable genes (HVG) simply based on VMR is an efficient and robust strategy. Here, we select the top 1,000 HVG for downstream analysis.
mca <- FindVariableGenes(object = mca, mean.function = ExpMean, dispersion.function = LogVMR,
do.plot = FALSE)
hv.genes <- head(rownames(email@example.com), 1000)
Also, as I found the plot to be pretty useless, I skipped plotting it by adding " do.plot = FALSE"
pbmc<-FindVariableGenes(object = pbmc, mean.function = ExpMean, dispersion.function = LogVMR, x.low.cutoff = 0.0125, x.high.cutoff = 3, y.cutoff = 0.5, do.plot = FALSE)
this will select variable genes defined by cutoff but will skip plotting