I'm working on a project involving analyzing scRNA-seq data. A large part of the project involves clustering cells, identifying DE genes between clusters, pathway analysis of the DE genes, etc. To do the analysis, I am planning to use Seurat. By default, Seurat uses the graph-based Louvain algorithm to cluster cells. So that would seem to indicate that it is important that the 2D embedding generated by t-SNE or UMAP is as accurate as possible so that the clusters are also maximally accurate.
Prior to doing t-SNE or UMAP, Seurat's vignettes recommend doing PCA to perform an initial reduction in the dimensionality of the input dataset while still preserving most of the important data structure. Seurat is definitely not the only pipeline to do this; it seems to me that most analysis pipelines use PCA prior to t-SNE / UMAP basically like Seurat does. However, it also seems to me that ICA is generally better at dividing cells based on the activation of gene modules than PCA. This seems to me to make sense in principle - i.e. gene modules behave more like independent gene combinations (as modeled by ICA) than orthogonal gene combinations (as modeled by PCA) - and also in practice - i.e. I've read a few papers presenting empirical evidence that ICA is better than PCA for differentiating cells based on gene module activation. Assuming this is correct, would it make more sense to use ICA rather than PCA to do the pre-t-SNE / UMAP dimensionality reduction? Or is there a compelling reason that most people seem to use PCA for this that I am simply unaware of?