Question

How to get genes that make up principal components

1

Entering edit mode

5.6 years ago

t-jim ▴ 30

Hello. It´s my first time performing RNA-Seq. I used DESeq2 for the differential expression analysis and I also did a PCA using DESeq2. Now I wanted to determine of which genes PC1 and PC2 consist of so I tried following this and this which worked, but somehow the columns don´t add up to 100.

Here is the code I used for determining the PC´s:

rv = rowVars(assay(rld)) 
select = order(rv, decreasing=TRUE)[seq_len(min(500, length(rv)))]
pc = prcomp(t(assay(rld)[select,]))
loadings = as.data.frame(pc$rotation)
aload = abs(loadings)
sweep(aload, 2, colSums(aload), "/")
View(aload)

Here is a part of the table:

                    PC1
ENSMUSG00000069516  0,16400947
ENSMUSG00000040809  0,15439586
ENSMUSG00000058715  0,12774194
ENSMUSG00000015340  0,12758962
ENSMUSG00000002985  0,12352009
ENSMUSG00000030579  0,11727118
ENSMUSG00000030214  0,11579513
ENSMUSG00000002111  0,11005741
ENSMUSG00000024621  0,11005048
ENSMUSG00000036594  0,10466841
ENSMUSG00000032359  0,10333928
ENSMUSG00000059498  0,10205552
ENSMUSG00000031722  0,10130551
ENSMUSG00000058818  0,10016428
ENSMUSG00000059108  0,09731863
ENSMUSG00000043832  0,09717016
ENSMUSG00000069515  0,09668688
ENSMUSG00000060063  0,09568078
ENSMUSG00000095788  0,09558955
ENSMUSG00000035493  0,09474437
ENSMUSG00000059089  0,09471363
ENSMUSG00000074677  0,09465178
ENSMUSG00000026628  0,09463913
ENSMUSG00000018927  0,0945181

As you can see the values don´t add up to 100. It´s the same for the other PC´s. I would appreciate any advice or help regarding my problem.

RNA-Seq DESeq2 R • 6.1k views

ADD COMMENT • link updated 5.6 years ago by Kevin Blighe 87k • written 5.6 years ago by t-jim ▴ 30

score 4 · Answer 1 · 2018-09-25

The rotated component loadings will not add up to 100; however, the proportion of variance of each PC will:

mat <- matrix(rexp(200, rate=.1), ncol=20)

project.pca <- prcomp(t(mat), scale=FALSE)

# Determine the proportion of variance of each component
# Proportion of variance equals (PC stdev^2) / (sum all PCs stdev^2)
project.pca.proportionvariances <- ((project.pca$sdev ^ 2) / (sum(project.pca$sdev ^ 2))) * 100

sum(project.pca.proportionvariances)
[1] 100

-----------------------------------

Now I wanted to determine of which genes PC1 and PC2 consist of

Each gene in your dataset will have a value for each PC - no individual gene 'belongs' to a particular PC. The values for each gene-to-PC combination will indicate the 'strength', translated in terms of covariance, of the gene to the respective PC.

You may take a look at my other answers:

Kevin