Should the expression data be centered before PCA?
1
0
Entering edit mode
2.5 years ago
Raheleh ▴ 210

I have expression profile of 27 samples (10 normal, 17 tumors). I did PCA analysis to check the quality of the samples to see them clustered in 2 different groups. I did it in two ways after RMA normalization and log2 transformation. 1. I did it without centring the data and the result showed that the quality is not good.pc <- prcomp(exp)

1. For the second time I centred the data (mean subtraction)

exp.scale <- t(scale(t(exp), scale = F))
pc <- prcomp(exp.scale)

My question is, if the PCA is calculated from the covariance matrix and mean centering does not affect the covariance matrix; why the output is different? After centering, the quality of the samples is much better! Should I consider that the quality of my samples are good or not?

Thanks!

PCA Centering expression data • 2.4k views
1
Entering edit mode

You did not transpose your matrix prior to running prcomp. In a gene expression matrix with rows = genes and columns = samples one would run PCA like prcomp(t(data)), see e.g. the source code of DESeq2::plotPCA. Re-run the PCA using the log2-normalized intensity values and see how it performs.

getMethod("plotPCA","DESeqTransform") Method Definition:

function (object, ...)
{
.local <- function (object, intgroup = "condition", ntop = 500,
returnData = FALSE)
{
rv <- rowVars(assay(object))
select <- order(rv, decreasing = TRUE)[seq_len(min(ntop,
length(rv)))]
## => HERE <= ##
pca <- prcomp(t(assay(object)[select, ]))
(... and so on ...)


For the future please see How to add images to a Biostars post. You need the pull path to the image.

0
Entering edit mode

Could you please fix the images ?

1
Entering edit mode
2.5 years ago

In your first example, you ARE centering your data, as the default of prcomp() is to center your data. Look up the function docs: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/prcomp.html

In your second example, you are actually centering the data twice, but one is by row (due to your transpose), while the other column.

I am not sure how you can say that your data's quality is 'not good' from the first bi-plot(?) - how are you defining 'not good'?

0
Entering edit mode

@Kevin

I have PCA data , genes in rows and samples in column

ExpAccession    ExpA 1 N    ExpA 2 N    ExpA 3 N    ExpA 4 N    ExpA 5 N    ExpA 1 T    ExpA 2 T    ExpA 3 T    ExpA 4 T    ExpA 5 T
P10645  -1.80   -2.31   -2.12   -1.99   -1.92   -1.98   -0.57   -0.99   -0.48   2.62
P31327  -1.57   -1.90   -1.98   -1.79   -1.71   -0.02   -0.67   -0.86   -1.60   2.53
Q9BYZ8  -1.08   -1.80   -1.62   -2.07   -1.51   -1.72   -0.40   0.57    -1.52   2.48
O43745  -2.59   -2.02   -2.65   -1.39   -1.68   1.00    -1.44   -0.78   -1.81   2.46
Q99795  -1.68   -2.15   -2.40   -2.08   -2.64   0.45    -0.48   -0.32   -1.46   2.42
Q02817  -1.03   -1.47   -1.19   -1.35   -1.31   -1.38   -0.49   0.10    -1.21   2.38


How I can adapt my matrix to your class (p) to carry on with your tutorial on plotting PCA?

Thank you

1
Entering edit mode

Hey, it looks like the first column of your data comprises the accession names. So, you will have to set those as the rownames.

Essentially, this may work:

rownames(data) <- data$ExpAccession # assign rownames data <- data[,-1] # remove the first column # perform PCA require(PCAtools) p <- pca(x, metadata = NULL, removeVar = 0.1)  ADD REPLY 0 Entering edit mode Thank you so much In some part I am getting this error > plotloadings(p, + rangeRetain = 0.01, + labSize = 3.0, + title = 'Loadings plot', + subtitle = 'PC1, PC2', + caption = 'Top 1% variables', + shape = 24, + col = c('limegreen', 'black', 'red3'), + drawConnectors = TRUE) -- variables retained: Q6UX53, P62684, Q06141, Q9TNN7, O00425, P01911, P12104, Q92968, P01889, Q9P2F6, P30443 Error in grid.Call(C_convert, x, as.integer(whatfrom), as.integer(whatto), : Viewport has zero dimension(s) >  ADD REPLY 1 Entering edit mode How many samples are in your data? Were you able to produce the bi-plot and SCREE plot? ADD REPLY 0 Entering edit mode Thank you 10 samples I have and I have produced both bi-plot and SCREE plots ADD REPLY 1 Entering edit mode hmm... I am not sure. Can you add the parameter: components = getComponents(p, 1:3)  ? Also, try to change the value of rangeRetain ADD REPLY 0 Entering edit mode Sorry Kevin Is it possible to have only 2 colors in PCA plot? For example grey for control and black for treatment? ADD REPLY 0 Entering edit mode The bi-plot? How have you created your pca object and which metadata do you have? ADD REPLY 0 Entering edit mode Thank you I don't have metadata If this is my data ExpAccession ExpA 1 N ExpA 2 N ExpA 3 N ExpA 4 N ExpA 5 N ExpA 1 T ExpA 2 T ExpA 3 T ExpA 4 T ExpA 5 T P10645 -1.80 -2.31 -2.12 -1.99 -1.92 -1.98 -0.57 -0.99 -0.48 2.62 P31327 -1.57 -1.90 -1.98 -1.79 -1.71 -0.02 -0.67 -0.86 -1.60 2.53 Q9BYZ8 -1.08 -1.80 -1.62 -2.07 -1.51 -1.72 -0.40 0.57 -1.52 2.48 O43745 -2.59 -2.02 -2.65 -1.39 -1.68 1.00 -1.44 -0.78 -1.81 2.46 Q99795 -1.68 -2.15 -2.40 -2.08 -2.64 0.45 -0.48 -0.32 -1.46 2.42 Q02817 -1.03 -1.47 -1.19 -1.35 -1.31 -1.38 -0.49 0.10 -1.21 2.38 rownames(data) <- data$ExpAccession # assign rownames
data <- data[,-1] # remove the first column

# perform PCA
require(PCAtools)
p <- pca(x, metadata = NULL, removeVar = 0.1)

biplot(p)

1
Entering edit mode

You will have to create metadata that has the same rownames as the colnames of x, and then assign it to metadata when you run pca():

p <- pca(x, metadata = myMetaData, removeVar = 0.1)


Then, to assign colours, you use colBy with biplot(). Examples HERE.

0
Entering edit mode

Sorry Kevin

By this code I have a biplot, I am saving that with width 5100 and length 2000 and as you are seeing axis labels and title are too small

How I can make them bigger please?

p2=biplot(pb,
colby = 'condition',colkey = c('N'='grey75', 'T'='black','T(scar)'='grey50'),
legendPosition = 'right',    hline = 0, vline = c(-25, 0,25),
vlineType = c('dotdash', 'solid', 'dashed'),
gridlines.major = FALSE, gridlines.minor = FALSE,
pointSize = 10, legendLabSize = 20, legendIconSize =10,
drawConnectors = FALSE,
title = 'PCA plot for expriment B (patients 6-10, N=normal esophagus, T=tumour)', labSize = 15)

p1=biplot(pa,
colby = 'condition',colkey = c('N'='grey75', 'T'='black'),
legendPosition = 'right',    hline = 0, vline = c(-25, 0,25),
vlineType = c('dotdash', 'solid', 'dashed'),
gridlines.major = FALSE, gridlines.minor = FALSE,
pointSize =10, legendLabSize = 20, legendIconSize = 10,
drawConnectors = FALSE,
title = 'PCA plot for expriment A (patients 1-5, N=normal esophagus, T=tumour)', labSize = 15)

plot_grid(p1,p2,label_size = 10)


1
Entering edit mode

Hey, the labels do not usually come out that small... If you run ?biplot in R, you will see all of the options for the biplot() function. The ones in which you will be interested are:

biplot(
...,
axisLabSize = 16,
title = '',
subtitle = '',
caption = '',
titleLabSize = 16,
subtitleLabSize = 12,
captionLabSize = 12,
..)

0
Entering edit mode

Sorry @Kevin

Really thank you very much for PCAtools package

You think I better use all of my data from RNA-seq for PCA biplot or using differentially expressed genes between tumour and normal tissue would make a better view of data?

0
Entering edit mode

It depends on what your aim is for using PCAtools? Usually, people use the entire dataset.

0
Entering edit mode

We treated some patients with chemotherapy, some patients responded and some not. So in PCA responding patients would group with normal samples and from their tumour only a scar remains. The goal is to show responding patients being grouped with adjacent normal samples

1
Entering edit mode

You can try 2 things:

1. PCAtools using entire dataset - this shows any 'natural' grouping based on the entire transcriptome
2. hierarchical clustering + heatmap using differentially expressed genes - this shows the ability of the differentially expressed genes to distinguish the groups
0
Entering edit mode

Hi Kevin. Thanks for the point!

I defined it as "not good" because in the first figure the samples from two different groups are not separated from each other as good as in the second figure. Isn't like this? If I am wrong please correct me. Thanks!