I'm exploring some ChIPseq data of histone modifications and really enjoy using deepTools since it has so many features and is quite fast. Recently, I used the plotPCA function and received the output you can see below.
I gotta admit that I'm not an expert in PCA, but in the first figure it seems that PC2 is the most powerful when it comes to describing the variability of the samples. However, in the second figure it is the first factor that describes 80% of the variability. I know that the factors are ranked so naturally the second is lower than the first, but am I missing something or is then the first factor plotted as PC2?
Follow-up question: Clearly, the variable plotted as PC2 is extremely important for me as I'm looking at a time-course and the results I see are very encouraging. What I would like to do is extract the regions / bins of the genome that can be used to discriminate the samples. Does anybody have an idea how to do that? I was thinking to the Monocle tool that is doing pseudo-time based clustering of single cell RNAseq data. I'm trying to find a way to feed my ChIPseq data into this tool bu there are some problems with normalization. Perhaps there is an easier way?
Any help is greatly appreciated!