Question: Should the expression data be centered before PCA?
0
gravatar for Raheleh
6 weeks ago by
Raheleh90
Raheleh90 wrote:

I have expression profile of 27 samples (10 normal, 17 tumors). I did PCA analysis to check the quality of the samples to see them clustered in 2 different groups. I did it in two ways after RMA normalization and log2 transformation. 1. I did it without centring the data and the result showed that the quality is not good.pc <- prcomp(exp)

Without Centering

  1. For the second time I centred the data (mean subtraction)

    exp.scale <- t(scale(t(exp), scale = F))
    pc <- prcomp(exp.scale)

Centered data

My question is, if the PCA is calculated from the covariance matrix and mean centering does not affect the covariance matrix; why the output is different? After centering, the quality of the samples is much better! Should I consider that the quality of my samples are good or not?

Thanks!

centering expression data pca • 259 views
ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by Raheleh90

Could you please fix the images ?

ADD REPLYlink written 6 weeks ago by badredda110

You did not transpose your matrix prior to running prcomp. In a gene expression matrix with rows = genes and columns = samples one would run PCA like prcomp(t(data)), see e.g. the source code of DESeq2::plotPCA. Re-run the PCA using the log2-normalized intensity values and see how it performs.

getMethod("plotPCA","DESeqTransform") Method Definition:

function (object, ...) 
{
    .local <- function (object, intgroup = "condition", ntop = 500, 
        returnData = FALSE) 
    {
        rv <- rowVars(assay(object))
        select <- order(rv, decreasing = TRUE)[seq_len(min(ntop, 
            length(rv)))]
        ## => HERE <= ## 
        pca <- prcomp(t(assay(object)[select, ]))
        (... and so on ...)

For the future please see How to add images to a Biostars post. You need the pull path to the image.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by ATpoint23k

In your first example, you ARE centering your data, as the default of prcomp() is to center your data. Look up the function docs: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/prcomp.html

In your second example, you are actually centering the data twice, but one is by row (due to your transpose), while the other column.

I am not sure how you can say that your data's quality is 'not good' from the first bi-plot(?) - how are you defining 'not good'?

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Kevin Blighe48k

@Kevin

I have PCA data , genes in rows and samples in column

ExpAccession    ExpA 1 N    ExpA 2 N    ExpA 3 N    ExpA 4 N    ExpA 5 N    ExpA 1 T    ExpA 2 T    ExpA 3 T    ExpA 4 T    ExpA 5 T
P10645  -1.80   -2.31   -2.12   -1.99   -1.92   -1.98   -0.57   -0.99   -0.48   2.62
P31327  -1.57   -1.90   -1.98   -1.79   -1.71   -0.02   -0.67   -0.86   -1.60   2.53
Q9BYZ8  -1.08   -1.80   -1.62   -2.07   -1.51   -1.72   -0.40   0.57    -1.52   2.48
O43745  -2.59   -2.02   -2.65   -1.39   -1.68   1.00    -1.44   -0.78   -1.81   2.46
Q99795  -1.68   -2.15   -2.40   -2.08   -2.64   0.45    -0.48   -0.32   -1.46   2.42
Q02817  -1.03   -1.47   -1.19   -1.35   -1.31   -1.38   -0.49   0.10    -1.21   2.38

I read your tutorial https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/doc/PCAtools.html

You start with class of pca

How I can adapt my matrix to your class (p) to carry on with your tutorial on plotting PCA?

Thank you

ADD REPLYlink modified 29 days ago • written 29 days ago by Angel3.5k
1

Hey, it looks like the first column of your data comprises the accession names. So, you will have to set those as the rownames.

Essentially, this may work:

rownames(data) <- data$ExpAccession # assign rownames
data <- data[,-1] # remove the first column

# perform PCA
require(PCAtools)
p <- pca(x, metadata = NULL, removeVar = 0.1)
ADD REPLYlink modified 28 days ago • written 28 days ago by Kevin Blighe48k

Thank you so much

In some part I am getting this error

> plotloadings(p,
+              rangeRetain = 0.01,
+              labSize = 3.0,
+              title = 'Loadings plot',
+              subtitle = 'PC1, PC2',
+              caption = 'Top 1% variables',
+              shape = 24,
+              col = c('limegreen', 'black', 'red3'),
+              drawConnectors = TRUE)
-- variables retained:
Q6UX53, P62684, Q06141, Q9TNN7, O00425, P01911, P12104, Q92968, P01889, Q9P2F6, P30443
Error in grid.Call(C_convert, x, as.integer(whatfrom), as.integer(whatto),  : 
  Viewport has zero dimension(s)
>
ADD REPLYlink written 28 days ago by Angel3.5k
1

How many samples are in your data? Were you able to produce the bi-plot and SCREE plot?

ADD REPLYlink written 27 days ago by Kevin Blighe48k

Thank you 10 samples I have and I have produced both bi-plot and SCREE plots

ADD REPLYlink written 27 days ago by Angel3.5k
1

hmm... I am not sure.

Can you add the parameter:

components = getComponents(p, 1:3)

?

Also, try to change the value of rangeRetain

ADD REPLYlink written 27 days ago by Kevin Blighe48k

Sorry Kevin

Is it possible to have only 2 colors in PCA plot? For example grey for control and black for treatment?

ADD REPLYlink written 14 days ago by Angel3.5k

The bi-plot? How have you created your pca object and which metadata do you have?

ADD REPLYlink written 14 days ago by Kevin Blighe48k

Thank you I don't have metadata

If this is my data

ExpAccession    ExpA 1 N    ExpA 2 N    ExpA 3 N    ExpA 4 N    ExpA 5 N    ExpA 1 T    ExpA 2 T    ExpA 3 T    ExpA 4 T    ExpA 5 T
P10645  -1.80   -2.31   -2.12   -1.99   -1.92   -1.98   -0.57   -0.99   -0.48   2.62
P31327  -1.57   -1.90   -1.98   -1.79   -1.71   -0.02   -0.67   -0.86   -1.60   2.53
Q9BYZ8  -1.08   -1.80   -1.62   -2.07   -1.51   -1.72   -0.40   0.57    -1.52   2.48
O43745  -2.59   -2.02   -2.65   -1.39   -1.68   1.00    -1.44   -0.78   -1.81   2.46
Q99795  -1.68   -2.15   -2.40   -2.08   -2.64   0.45    -0.48   -0.32   -1.46   2.42
Q02817  -1.03   -1.47   -1.19   -1.35   -1.31   -1.38   -0.49   0.10    -1.21   2.38

rownames(data) <- data$ExpAccession # assign rownames
data <- data[,-1] # remove the first column

# perform PCA
require(PCAtools)
p <- pca(x, metadata = NULL, removeVar = 0.1)

 biplot(p)
ADD REPLYlink written 14 days ago by Angel3.5k
1

You will have to create metadata that has the same rownames as the colnames of x, and then assign it to metadata when you run pca():

p <- pca(x, metadata = myMetaData, removeVar = 0.1)

Then, to assign colours, you use colBy with biplot(). Examples HERE.

ADD REPLYlink written 14 days ago by Kevin Blighe48k

Sorry Kevin

By this code I have a biplot, I am saving that with width 5100 and length 2000 and as you are seeing axis labels and title are too small

How I can make them bigger please?

p2=biplot(pb,
                       colby = 'condition',colkey = c('N'='grey75', 'T'='black','T(scar)'='grey50'),
                       legendPosition = 'right',    hline = 0, vline = c(-25, 0,25),
                       vlineType = c('dotdash', 'solid', 'dashed'),
                       gridlines.major = FALSE, gridlines.minor = FALSE,
                       pointSize = 10, legendLabSize = 20, legendIconSize =10,
                       drawConnectors = FALSE,
                       title = 'PCA plot for expriment B (patients 6-10, N=normal esophagus, T=tumour)', labSize = 15)

 p1=biplot(pa,
                       colby = 'condition',colkey = c('N'='grey75', 'T'='black'),
                       legendPosition = 'right',    hline = 0, vline = c(-25, 0,25),
                       vlineType = c('dotdash', 'solid', 'dashed'),
                       gridlines.major = FALSE, gridlines.minor = FALSE,
                       pointSize =10, legendLabSize = 20, legendIconSize = 10,
                       drawConnectors = FALSE,
                       title = 'PCA plot for expriment A (patients 1-5, N=normal esophagus, T=tumour)', labSize = 15)

 plot_grid(p1,p2,label_size = 10)

enter image description here

ADD REPLYlink written 11 days ago by Angel3.5k
1

Hey, the labels do not usually come out that small... If you run ?biplot in R, you will see all of the options for the biplot() function. The ones in which you will be interested are:

biplot(
  ...,
  axisLabSize = 16,
  title = '',
  subtitle = '',
  caption = '',
  titleLabSize = 16,
  subtitleLabSize = 12,
  captionLabSize = 12,
  ..)
ADD REPLYlink modified 10 days ago • written 11 days ago by Kevin Blighe48k

Sorry @Kevin

Really thank you very much for PCAtools package

You think I better use all of my data from RNA-seq for PCA biplot or using differentially expressed genes between tumour and normal tissue would make a better view of data?

ADD REPLYlink written 9 days ago by Angel3.5k

It depends on what your aim is for using PCAtools? Usually, people use the entire dataset.

ADD REPLYlink written 9 days ago by Kevin Blighe48k

We treated some patients with chemotherapy, some patients responded and some not. So in PCA responding patients would group with normal samples and from their tumour only a scar remains. The goal is to show responding patients being grouped with adjacent normal samples

ADD REPLYlink written 9 days ago by Angel3.5k
1

You can try 2 things:

  1. PCAtools using entire dataset - this shows any 'natural' grouping based on the entire transcriptome
  2. hierarchical clustering + heatmap using differentially expressed genes - this shows the ability of the differentially expressed genes to distinguish the groups
ADD REPLYlink modified 9 days ago • written 9 days ago by Kevin Blighe48k

Hi Kevin. Thanks for the point!

I defined it as "not good" because in the first figure the samples from two different groups are not separated from each other as good as in the second figure. Isn't like this? If I am wrong please correct me. Thanks!

ADD REPLYlink written 26 days ago by Raheleh90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 761 users visited in the last hour