Question

PCA plot interpretation

0

Entering edit mode

6.2 years ago

phur • 0

Hi Everybody, I have analyzed some RNA-Seq datasets using Deseq2. I have five samples of the Normal embryo and three samples of Trisomy 21. I got the PCA plot and I am not able to interpret what the plot means. I have done online research and tried to find it out but I am not able to understand as I don't have a statistician background and I am not able to find out what it means. I know from the sample that one group is an outlier. Can anybody please help me with what the data means? I have attached the plot to this forum.

RNA-Seq pca • 2.6k views

ADD COMMENT • link updated 6.2 years ago by Jean-Karim Heriche 27k • written 6.2 years ago by phur • 0

0

Entering edit mode

https://ibb.co/hKbtO7

ADD REPLY • link 6.2 years ago by phur • 0

score 2 · Answer 1 · 2018-02-17

2

Entering edit mode

6.2 years ago

Kritika ▴ 260

Hi @Phur The PCA plot is used for classification As i can see there is 25% AND 32% Variation among all samples that means samples(Normal + trisomy) is very close enough This is 2Dimensional plot so you are getting 1 variation as 25% ands another as 32% Hope this is clear!!

ADD COMMENT • link 6.2 years ago by Kritika ▴ 260

score 1 · Answer 2 · 2018-02-17

PCA finds axes along which the data points are maximally spread out. These axes are called components and are defined as a linear combination of the variables present in the data. They are ordered by the fraction of total variance that they capture. Plotting the data points in the space of these new axes is often used to visualize whether different categories of data points (e.g. control and treatment) are well separated or to find if there is some structure in the data (i.e. if one can see some pattern in how the points are distributed). Using only a limited number of components is also a dimensionality reduction technique widely used before the application of machine learning algorithms. It's important to understand PCA to also understand its uses and limitations. For example, it assumes that dimensions with maximum variance are most relevant and that orthogonal axes are informative. There are plenty of tutorials and explanations on the web. You may be interested in this post on Cross Validated.