Interpretation of two PCA plots
6 months ago
newbie ▴ 90

Hi,

I am working with a cancer type dataset. There are in total 250 samples. All these samples are classified into 6 groups. And I plotted a PCA plot with Gencode Protein coding genes and it looks like below:

Similarly, I also plotted a PCA with only Gencode lncRNAs which looks like below:

Why the second PCA which is with only Gencode lncRNAs look completely different from first PCA? What is the reason? Can anyone please explain clearly about this.

Thank you.

RNA-Seq pca clustering
6 months ago
swbarnes2 9.9k

They don't look completely different to me. In the first PC, you have Group4 on one side, 2 and some 5 and 6 in the middle, and the rest of 6, 1, 3 and 5 on the other end.

You got something a little more interesting happening in PC2 in the lncRNAs, but it's still a pretty small % of the variance.

The choice of whether the axes are as they are or mirror images is completely arbitrary. You can mirror flip a PCA and it's exactly as accurate.

Since variance explained by PCA1 is different in the plots (25 and 20 %), It seems plots are coming from two different PCA analysis. Correct? So a few differences are expected, however, as you mentioned still they are similar.

Thanks for the answer. I understand that they are still similar, may I know why they are mirror images?

The decision as to which end of the PC is positive and which is negative is essense arbitrary. Someone could take your data, run it on their system, and their computer could decide the other way, based on tiny differences in the computing environment.

In both graphs, PC1 is showing the same trend. It means nothing that one looks backward compared to the other.