Question

Exporting the list of top loading genes from a particular principal component

2

Entering edit mode

3.1 years ago

n.tear ▴ 80

I have just used PCA tool to create a PC loadings graph in order to identify the genes explaingin the variance for PC3 using the following code:

library(PCAtools)
rld.sm.output <- as.data.frame(assay(rld.sm))
metadata <- coldata

require(org.Hs.eg.db)
mapping <- mapIds(
  org.Hs.eg.db,
  keys = rownames(rld.sm.output),
  column = 'SYMBOL',
  keytype = 'ENSEMBL')
rownames(rld.sm.output) <- make.unique(ifelse(is.na(mapping), rownames(rld.sm.output), mapping))

pca.project <- pca(rld.sm.output, metadata = metadata, removeVar = 0.1)

plotloadings(pca.project,
    rangeRetain = 0.05,
    labSize = 4.0,
    title = 'Loadings plot - top 5% contributing',
    subtitle = 'PC1, PC2, PC3, PC4, PC5',
    caption = 'Top 5% variables',
    drawConnectors = TRUE)

enter image description here

My Question:

How can I extract the top loading genes from the PC of interest (PC3 in this case)? When I extract the loadings and then sort by largest to smallest, the genes dont match my loadings graph (apart from the two genes with the highest loading):

write.csv(pca.project$loadings[1:3], file='PCAloadings.csv')

enter image description here

PCAtools • 1.5k views

ADD COMMENT • link updated 3.1 years ago by Kevin Blighe 87k • written 3.1 years ago by n.tear ▴ 80

score 2 · Answer 1 · 2021-03-31

2

Entering edit mode

3.1 years ago

Kevin Blighe 87k

Hi again, it's literally about filtering on that loadings object. Keep in mind, however, that the values can be positive or negative, with both having equal weighting and having the same sign as the values on the corresponding bi-plot.

For example, CAPN8, with a negative score of -0.00605, will be associated with whatever samples are shifted toward the negative end of your PC1 on a bi-plot.

For the plotloadings() function, I think that the way that I filter is by taking, e.g., the genes that fall within the top 5% of the range of values for each PC. There are obviously many other ways to filter these, too.

Kevin

ADD COMMENT • link 3.1 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi again Kevin,

Thank you for your response it has helped me to understand better. But one thing:

In the csv file I have shared, I have sorted by PC3 loadings by size in column D.

I see PI3 and ENSG00000267303 at the top of column D...

Consequently I do see PI3 and ENSG00000267303 in the PC3 section of my plotloadings() graph...

but why do I not see PPP1R1B in the graph if the loadings .csv suggests it is also in the top 5%?

Hope that makes sense

ADD REPLY • link 3.1 years ago by n.tear ▴ 80

0

Entering edit mode

plotloadings usually displays a message to terminal when the function is executed. Is the gene listed there? It can be that it falls just outside the top 5% of the range

ADD REPLY • link 3.1 years ago by Kevin Blighe 87k