Question: Question about PCA plot using RPKM/FPKM.
gravatar for hxlei613
5 months ago by
hxlei61380 wrote:

Hi, after searching for how to draw PCA plot using FPKM, there is still a question confusing me. For example, I have a FPKM matrix (let's say matrix_sample) for sample1 sample2 .. sampleN control1, control2 ... controlN (column) and gene1, gene2 ... geneN(row). I want to check if the data have batch effect. So ideally I want to see points representing samples and points representing controls are seperated into 2 parts in the plot.

I note that there are 2 method to draw PCA plots.

a) # note that in this method, rows an columns of matrix_sample are geneN and sampleN(or controlN).

   pca = prcomp(matrix_sample)
   plot(pca$rotation[,1],pca$rotation[,2], xlab = "PC1", ylab = "PC2")

b) # note that in this method, matrix_sample is transposed.

   pca = prcomp(t(matrix_sample))

I don't know which method is correct for a) doesn't transpose matrix and b) transpose it. I know that usually row is observation and column is variable. But in biology samples are less than genes so row is gene and column is sample. This can make the matrix more easily to understand. However for me plots are not the same generated by these 2 methods. I don't know why. I didn't find any information or I miss something. Please help me out. Thank you very much!

rna-seq pca fpkm • 296 views
ADD COMMENTlink modified 5 months ago by Kevin Blighe39k • written 5 months ago by hxlei61380
gravatar for Kevin Blighe
5 months ago by
Kevin Blighe39k
Republic of Ireland
Kevin Blighe39k wrote:

I would not use FPKM units for PCA, nor would I use these units for any analyses where sample comparisons were the intention. FPKM units are produced from a normalisation process that renders samples incomparable because there is nil / zero / no cross-sample normalisation in this method - some also question the within-sample normalisation that produces FPKM, too. If you must use FPKM, at least convert these units to the Z-scale via zFPKM package in R, first, i.e., before running the PCA transformation.

It is perfectly fine to perform PCA on the transposed and un-transposed data matrix. However, in each case, the x variable returned by prcomp() will naturally relate to different things, one being samples and the other your genes.


if retx is true the value of the rotated data (the centred (and scaled if requested) data multiplied by the rotation matrix) is returned. Hence, cov(x) is the diagonal matrix diag(sdev^2). For the formula method, napredict() is applied to handle the treatment of values omitted by the na.action. [from:]

See also:


ADD COMMENTlink modified 5 months ago • written 5 months ago by Kevin Blighe39k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 769 users visited in the last hour