Criteria for performing PCA or ICA
1
1
Entering edit mode
2.1 years ago
hamid sta ▴ 10

Hello, I'm currently analyzing scRNA-seq data. After quality control of my cells, i have done standard normalization / scaling / find variable features , with the tools sctransform developed by satija lab . However for the dimensionality reduction part, i have started with the pca. However the clustering was not clear. And Based on well knowns markers genes, it appear that, different cell types ( in my case it's the "x zone" and "fasciculata " ) are not clustered separately, which is not what I expected . And upping the resolution, of the clustering appear to just create other "sub" cluster, who are just to similar for annotate it as different cell types .

However, after performing an ICA instead of a PCA, it appear that my two previous cell types " x zone" and "fasciculata " are well separated in two different cluster, which is perfect, and all my markers genes match perfectly with my clustering .

So it appear that ica are more suited for my dataset. That's why i wonder if there is any statistical criteria/evidence that can help me understand why ICA is better than PCA in my case ?

Also the proportion of variance explained is only 13% with 50 PCs and 11% for 20 PCs. Which is really low . Maybe it's related to this ?

thanks you

ICA scRNA PCA • 580 views
ADD COMMENT
1
Entering edit mode
2.1 years ago
Mensur Dlakic ★ 27k

Dimensionality reduction methods do exactly what their name implies. The first PCA component is the direction that best explains the data variability, and each subsequent PC is the next best. In PCA there is a ranking of components, and we know to include early PCs before late ones. When you get only 13% variability explained with 50 PCs, either there is no structure in your data, or it is very noisy.

ICA is more of a data separation rather than data compression technique. Its components are not ranked and are required collectively to separate the data. You may want to do PCA and feed its first 50-100 PCs into ICA. Although ICA is adept at extracting signals from noisy data, it may help to let PCA take a first crack at removing the noise.

This thread may not be a direct answer to your question, but it will hopefully give you some food for thought.

ADD COMMENT
0
Entering edit mode

Hello , Thanks very much . I have already see the thread linked, and it's also helpful to have a good understanding of the ica. Concerning the low % of variability explained with my PC . I think the problem was about the "find variable features" step. I have basically return all the variable genes , and not the 'common' threshold of 3000 genes . Now the first 50 Pc explain 35% wich is slightly better.

ADD REPLY

Login before adding your answer.

Traffic: 2614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6