Question: Is it possible to make a PCA plot for samples using TPM exprssion values in R?
0
gravatar for rimgubaev
14 months ago by
rimgubaev140
Russia/Moscow/Skoltech
rimgubaev140 wrote:

I have a table with TPM expression values for several samples (10-12) and I want to create PCA plot in order to estimate similarity of raplicates of a certain conditions. If it possible could you please suggest some pipelines or commands for R?

rna-seq tpm pca • 831 views
ADD COMMENTlink modified 14 months ago by andrew.j.skelton735.8k • written 14 months ago by rimgubaev140

I think you can do so

library(scater)

example_sce <- SingleCellExperiment(
    assays = list(counts = matrix of your raw values))

cpm(example_sce) <- calculateCPM(example_sce)

example_sce <- normalize(example_sce)

plotPCA(example_sce)

You can do many things here

https://bioconductor.org/packages/devel/bioc/vignettes/scater/inst/doc/vignette-dataviz.html#generating-pca-plots

ADD REPLYlink written 14 months ago by jivarajivaraj50
7
gravatar for andrew.j.skelton73
14 months ago by
London
andrew.j.skelton735.8k wrote:

You should transform your data to a log-like scale. If you're analysing in DESeq2, look at vst or rlog methods, alternatively if you're using Limma Voom, then your data should be good to go. Have a look at the tximport package if you're confused about these different input metrics.

When you've got your data in the correct scale, here's a nice bit of code to produce a PCA - note I'm using dummy data in this case.

library(tidyverse) #CRAN - install.packages("tidyverse")
library(ggrepel)   #CRAN - install.packages("ggrepel")

# Generate some fake data
set.seed(73)
mat.row      <- 1000
mat.col      <- 15
data.pheno   <- data.frame(SampleID   = paste0("SAM", 1:mat.col),
                           SampleType = rep(c("A","B","C"), times = mat.col / 3),
                           stringsAsFactors = F)
foo          <- rnorm(mat.row * mat.col, mean = 300) %>% 
                log2 %>% 
                matrix(., ncol = mat.col) %>% 
                `colnames<-`(data.pheno$SampleID)
# 

# Generate PCA Data & Proportion of variability
pca          <- foo %>% t %>% prcomp
d            <- pca$x %>% as.data.frame %>% 
                add_rownames("SampleID") %>% 
                left_join(data.pheno) 
pcv          <- round((pca$sdev)^2 / sum(pca$sdev^2)*100, 2)
# 

# Make a pretty Picture
plot.pca    <- ggplot(d, aes(PC1,PC2,colour = SampleType)) +
               geom_point() +
               xlab(label=paste0("PC1 (", pcv[1], "%)")) +
               ylab(label=paste0("PC2 (", pcv[2], "%)")) +
               theme_bw() +
               geom_label_repel(aes(label = SampleType), show.legend = F) +
               theme(axis.title.x = element_text(size=15),
                     axis.title.y = element_text(size=15)) +
               labs(title    = "My Fake PCA",
                    subtitle = "With some random data",
                    caption  = "Coloured by my random phenotype")
print(plot.pca)
#

PCA Plot

ADD COMMENTlink modified 14 months ago • written 14 months ago by andrew.j.skelton735.8k
2

Very nice!

ADD REPLYlink written 14 months ago by Kevin Blighe51k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1299 users visited in the last hour