Exporting the list of top loading genes from a particular principal component
1
2
Entering edit mode
3.1 years ago
n.tear ▴ 80

I have just used PCA tool to create a PC loadings graph in order to identify the genes explaingin the variance for PC3 using the following code:

library(PCAtools)
rld.sm.output <- as.data.frame(assay(rld.sm))
metadata <- coldata

require(org.Hs.eg.db)
mapping <- mapIds(
  org.Hs.eg.db,
  keys = rownames(rld.sm.output),
  column = 'SYMBOL',
  keytype = 'ENSEMBL')
rownames(rld.sm.output) <- make.unique(ifelse(is.na(mapping), rownames(rld.sm.output), mapping))

pca.project <- pca(rld.sm.output, metadata = metadata, removeVar = 0.1)

plotloadings(pca.project,
    rangeRetain = 0.05,
    labSize = 4.0,
    title = 'Loadings plot - top 5% contributing',
    subtitle = 'PC1, PC2, PC3, PC4, PC5',
    caption = 'Top 5% variables',
    drawConnectors = TRUE)

enter image description here

My Question:

How can I extract the top loading genes from the PC of interest (PC3 in this case)? When I extract the loadings and then sort by largest to smallest, the genes dont match my loadings graph (apart from the two genes with the highest loading):

write.csv(pca.project$loadings[1:3], file='PCAloadings.csv')

enter image description here

PCAtools • 1.5k views
ADD COMMENT
2
Entering edit mode
3.1 years ago

Hi again, it's literally about filtering on that loadings object. Keep in mind, however, that the values can be positive or negative, with both having equal weighting and having the same sign as the values on the corresponding bi-plot.

For example, CAPN8, with a negative score of -0.00605, will be associated with whatever samples are shifted toward the negative end of your PC1 on a bi-plot.

For the plotloadings() function, I think that the way that I filter is by taking, e.g., the genes that fall within the top 5% of the range of values for each PC. There are obviously many other ways to filter these, too.

Kevin

ADD COMMENT
0
Entering edit mode

Hi again Kevin,

Thank you for your response it has helped me to understand better. But one thing:

In the csv file I have shared, I have sorted by PC3 loadings by size in column D.

I see PI3 and ENSG00000267303 at the top of column D...

Consequently I do see PI3 and ENSG00000267303 in the PC3 section of my plotloadings() graph...

but why do I not see PPP1R1B in the graph if the loadings .csv suggests it is also in the top 5%?

Hope that makes sense

ADD REPLY
0
Entering edit mode

plotloadings usually displays a message to terminal when the function is executed. Is the gene listed there? It can be that it falls just outside the top 5% of the range

ADD REPLY

Login before adding your answer.

Traffic: 1754 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6