What does PCA really tells about Biological data?
19 days ago

Very well clustered PCA analysis indeed provides insights into the underlying structure of biological data by identifying patterns and relationships among variables.

My question is, if we do not see OMICS (proteomics / metabolomics / transcriptomics) research, sample replicates data is not very well clustered in PCA plot, what does this mean? Does that means our samples/replicates looks similar?

How we can biologically interpret PCA results?

Aslo,if we see clear clustering in non neighegour PCA biplots, what are the downstream analysis we could perform? in otherwords, what is the point if looking at range of PC's in bi-plots?

Below two figures took from PCAtools: everything Principal Component Analysis by Kevin Blighe & Aaron Lun enter image description here

19 days ago

To me, PCA starts as a quality control, then perhaps/maybe/at some point it becomes a biological tool.

In your first plot, PC2 (y-axis) seems to correlate a bit better with ER status than PC1 (x-axis, 33% of variation!). You have green and purple points on the -30 side and green and purple points on the +30 side. It looks slightly cleaner on the PC2 side.

There's clearly some other, unmeasured covariate at work here that divides the points across PC1 so cleanly. The rest of the PCAtools tutorial digs into some of the PC1, PC2 correlations and shows that Size, Distant.RFS, Grade are covariates that correlate strongly with PC1, but it looks like a complex interaction that perhaps two dimensions do not capture well visually.

Sometimes, especially with SNPs, you get nice correlations with population background, Wikipedia has an example plot from https://commons.wikimedia.org/wiki/File:Worldwide_human_populations_-_PCA_results.png

Wikpedia plot

But then again, look at those percentages explained: 7.5% + 4.8% is not the world.


