For any sort of single cell program, a big step is picking the amount of principal components for downstream analysis. However, the one approach that does not make sense is when people say to use heat maps. For starters, what exactly are we looking for when we analyze our heatmaps for, let’s say 5 PCs and another heatmap for 10? What makes one better than the other. Any help would be great!
Each PC will account for a proportion of the overall variation in the dataset. So, simply saying 'choose 5' or 'choose 10' PCs makes no sense when one considers that possibility that 80% of the variation may be accounted for by the first 3 PCs... or even the first 20. I can only assume [and hope] that whichever program you are using outputs or stores the % variation explained by each PC.
Specifically in relation to the heatmaps, one is looking for 'structure' and patterns in each respective heatmap, which implies that the PC can provide some discriminatory power across the cells. Still, I would not choose a top 'n' PCs based on just 'looking' at each PC heatmap. There are metrics that can be employed to do this.
This posting goes over it quite well in relation to Seurat:
I provide other metrics in my own package, which is independent from any scRNA-seq analysis tool: