Question

Understanding PCA plot from DiffBind

0

Entering edit mode

2.9 years ago

Marco Pannone ▴ 790

Hey everybody

I have a question regarding the dba.plotPCA function and I hope some of you might help me. I am performing differential binding analysis for my ChIP-seq dataset and I am trying to interpret in a comprehensive way the PCA plot I get out of the function mentioned above.

Specifically, I am wondering how can I know what specific variables account for each principal component (PC1, PC2, ..). In this way, I aim to understand which variable in my dataset makes some samples cluster together and which other variable separates them in the plot.

I hope my question was clear enough and thanks in advance to anybody who will try to give me an answer.

diffbind pca chip-seq • 1.5k views

ADD COMMENT • link updated 2.9 years ago by Rory Stark ★ 2.0k • written 2.9 years ago by Marco Pannone ▴ 790

0

Entering edit mode

you can extract the object where you will have individual component and then you can use them to rank to find what component is making the difference

ADD REPLY • link 2.9 years ago by 1769mkc ★ 1.2k

score 2 · Accepted Answer · 2021-05-11

The PCA plot is an exploratory tool. Different components may or may not correspond to experimental variables as represented in the metadata captured for the experiment. The main way to check if a known variable corresponds to a components (and hence is driving variation between samples) is to change the colors and labels in the plot based on different metadata categories. For example, you can do the same plot coloring the points based on the DBA_CONDITION and then plot it again using DBA_REPLICATE to see if they are clustering based on the known biological signal or a known technical effect.

You may also want to check components beyond the first two. You can do this by using the 3D plot option to see three principal components. You can also change which components are plotted, to show for example the second and third components in a 2-D plot.