Forum: Feature Selection and Dimensionality Reduction in 10X scRNA-seq data
4
gravatar for ATpoint
13 months ago by
ATpoint45k
ATpoint45k wrote:

I am looking for opinions (hands-on based experience) towards your favourit feature selection (followed by dimensionality reduction) method for 10X-based scRNA-seq data. The motivation for this is that I recently stumbled over the GLM-PCA approach from Rafael Irizarry's lab (links see on the bottom of the post) which made me dive into the literature. As expected there are plenty of methods out there, each claiming to perform superior. Since GLM-PCA operates on raw counts it frees the uses from choosing from one of the many normalization strategies such as the ones implemented in e.g. scran or the choices provided by Seurat. This is admittedly not at all a precise question (therefore Forum post), and I hope to initiate some chat here about your current best practices that users inexperienced in the single-cell world (including myself) can take inspiration from.


GLM-PCA:

Paper: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1861-6

Git: https://github.com/willtownes/scrna2019

CRAN: https://cran.r-project.org/web/packages/glmpca/index.html

ADD COMMENTlink modified 6 weeks ago • written 13 months ago by ATpoint45k
3
gravatar for will.townes
11 months ago by
will.townes50
will.townes50 wrote:

Hi, thanks for your interest in GLM-PCA (I'm one of the authors). First of all, GLM-PCA is a dimension reduction method meant to be as similar to PCA as possible but just using a count-based likelihood (or loss function) instead of the implicit normal distribution likelihood of PCA. Since you seem to be mostly interested in feature selection (ie identifying highly informative genes), I encourage you to check out our R package scry (soon to be submitted to bioconductor) which includes feature selection based on deviance as an alternative to the more traditional "highly variable genes" approach. As you mention it operates on raw UMI counts so no need for normalization, and according to a recent comparison by an independent research group has been shown to perform well vs competing methods. The scry package also includes a null residuals transformation (similar to the sctransform method from Hafemeister et al) that can be fed directly to traditional PCA instead of normalized counts. The null residuals are basically a rough approximation to GLM-PCA that are much faster to compute. Alternatively, if you have another normalization/dimension reduction scheme in mind, you can just use the deviance feature selection to choose say the top 2,000 genes then do whatever you like with those. As a side note, we are actively working to improve the scalability and numerical stability of the GLM-PCA optimization routine, so stayed tuned for those updates in the future.

ADD COMMENTlink written 11 months ago by will.townes50

Thanks will.townes for the pointer to the scry package. Will try.

ADD REPLYlink modified 5 months ago • written 9 months ago by ATpoint45k
2
gravatar for igor
13 months ago by
igor12k
United States
igor12k wrote:

I saw the GLM-PCA benefits. I believe that there are at least some scenarios where it does perform better. However, does it actually uncover new biological insights? Many single-cell methods make significant improvements on some metrics and look impressive on paper, but very few would actually change the conclusions that were based on classic techniques.

Personal anecdote: I tried not normalizing the data at all and expected completely nonsensical results. However, the major populations still clearly segregated.

ADD COMMENTlink written 13 months ago by igor12k

Personal anecdote: I tried not normalizing the data at all and expected completely nonsensical results. However, the major populations still clearly segregated.

That is interesting observation indeed. Have you tried it with > n=1 to see if it is widely applicable?

ADD REPLYlink written 13 months ago by GenoMax96k
1

I have not experimented much with it. I've been meaning to run a more comprehensive analysis, but more pressing tasks get in the way.

ADD REPLYlink written 13 months ago by igor12k

My expectation is that you'd see fairly significant sample-to-sample effects with zero normalization, but would be interested in seeing if that's actually true.

ADD REPLYlink written 13 months ago by jared.andrews078.6k

That may be true. I normally see sample-to-sample effects regardless of normalization (without some sort of batch-correction methods like CCA/MNN/etc).

ADD REPLYlink written 13 months ago by igor12k

Think it also depends on sample. Normal PBMCs are fairly consistent between samples without batch correction through standard pipelines, assuming they're done fairly close to each other by the same person. Disease samples are a different story though.

ADD REPLYlink written 13 months ago by jared.andrews078.6k
1

Agreed. High-quality healthy samples processed the same way tend to be fairly consistent.

ADD REPLYlink modified 13 months ago • written 13 months ago by igor12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1732 users visited in the last hour
_