Following a single-cell RNA-seq workshop, I created a Seurat object (
my_data), normalized the data, and then tried to identify highly variable genes using two different R packages:
variable_genes_Seurat <- my_data %>% FindVariableFeatures(selection.method = 'vst', nfeatures = 2000) %>% VariableFeatures() variable_genes_M3Drop <- my_data %>% GetAssayData('counts') %>% # unnormalized NBumiConvertData() %>% NBumiFitModel() %>% NBumiFeatureSelectionCombinedDrop(ntop = 2000) %>% rownames()
I compared the results and found out that the gene lists were pretty different and shared only 588 of 2000 genes.
shared_variable_genes <- intersect(variable_genes_Seurat, variable_genes_M3Drop) length(shared_variable_genes)
I wonder why the results are so different, and which feature list -
variable_genes_M3Drop - I should use for dimensionality reduction and clustering.