How many principle components (PCs) to embed in the neighbourhood graph for t-SNE/UMAP in the context of single cell RNA-seq analysis?
Note here I'm not so concerned about performance in terms of speed/memory but am interested in accuracy in terms of de noising the data without removing relevant biological signal.
Options I have seen:
- A fixed lower number, e.g. 10
- A fixed higher number, e.g. 50 (e.g. default in Scanpy)
- Elbow plot visual estimate
- Elbow plot statistical cutoff (which one?)
- JackStraw (e.g. in Seurat )
- Molecular Cross Validation (https://www.biorxiv.org/content/10.1101/786269v1 )
- Use ICA instead of PCA (but how many ICs?)
- It doesn't matter (show me the evidence)