Question

How to generate a 2D PCA plot from bulk RNA-seq data (log2 CPM) using the PCAtools?

1

Entering edit mode

2.5 years ago

vHelmholtz ▴ 40

Hi all,

I have bulk RNA-seq data with 12 samples - WT (x4), 'A' KO (x4), and 'B' KO (x4). I want to generate a 2D PCA plot (biplot) like below figure to look at the relationship between the samples.

a 2D PCA plot

I have tried an R package, 'PCAtools,' but it looks not work correctly as below. the PCA plot generated by the code below

I have pasted my code and data below. I will very much appreciate it if you share any advice or suggestions.

Thanks in advance!

Joshua

library(PCAtools)
data <-read.csv("C:/.../all,gene_log2cpm_revised.csv", fileEncoding = 'UTF-8-BOM')

groups = c(rep("WT", 4), rep("A-KO", 4), rep("B-KO", 4))
cols = c('red', 'green', 'blue')[factor(groups)]

data$gene_name = as.numeric(as.factor(data$gene_name))

pca = prcomp(data)
pca$x
pca$sdev
biplot(pca, cex=0.7, scale=T, xlim=c(-0.6,+0.6))

Data file format:

Data file format top genes

RNA-seq PCA • 2.2k views

ADD COMMENT • link 2.5 years ago by vHelmholtz ▴ 40

score 4 · Accepted Answer · 2021-10-30

4

Entering edit mode

2.5 years ago

Kevin Blighe 87k

Hi Joshua,

Your second plot is not produced by my package, PCAtools. It is produced via base R functions.

Can you take a look through my vignette, please? - https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/doc/PCAtools.html#quick-start-deseq2

You probably need something like:

library(PCAtools)
data <- read.csv(
  'C:/.../all,gene_log2cpm_revised.csv',
  fileEncoding = 'UTF-8-BOM',
  row.names = 1)

groups = c(rep("WT", 4), rep("A-KO", 4), rep("B-KO", 4))
cols = c('red', 'green', 'blue')[factor(groups)]

metadata <- data.frame(groups = groups)
rownames(metadata) <- colnames(data)
p <- pca(data, metadata = metadata)

biplot(p, colby = 'groups')

Kevin

ADD COMMENT • link 2.5 years ago by Kevin Blighe 87k

1

Entering edit mode

I deleted the gene_name column from the CSV file, upload the file, and generated a PCA biplot as below.

Thanks, Kevin! You helped me a lot.

-Joshua

enter image description here

ADD REPLY • link 2.5 years ago by vHelmholtz ▴ 40

0

Entering edit mode

Thank you so much for your help, Kevin!

I found that the code below does not work. It seems that is because the number of columns (13, including the first column of the data file, 'gene_name') does not match that of rows (12)

rownames(metadata) <- colnames(data)

-> Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

Could you help me with how to exclude the 'gene_name' column from the PCA analysis?

-Joshua

ADD REPLY • link 2.5 years ago by vHelmholtz ▴ 40