Extracting Genes that contribute to 1st PC
3
0
Entering edit mode
3.8 years ago
reara ▴ 30

I am trying to figure out how to extract the genes that contribute to the 1st Principal Component of a Module in WGCNA. Any advice on methodology would be much appreciated.

wgcna RNA-Seq PCA gene rna-seq • 2.5k views
ADD COMMENT
0
Entering edit mode

Not too familiar with this particular part of WGCNA; however, please show relevant code that you have used and also run str() on your key objects (and paste the output here).

ADD REPLY
0
Entering edit mode

`eigenmatrix.black.na <- t(na.omit(t(eigenmatrix.black)))

df_black <- read.csv('eigengmatrix.black.na.csv', header = TRUE)

X_black <- df_black

X_black <- as.matrix(df_black)

nPC=1

n=dim(X_black)[1]

p=dim(X_black)[2]

svd1 = svd(X_black, nu = min(n, p, nPC), nv = min(n, p, nPC))

svd1$d

PCA

pZ <- prcomp(X_black, rank. = 3) #to get 1st 3 components

summary(pZ) pZ$x #to get components`

below is the eigenmatrix.black.na which consists of the genes and expression values of all the genes from the black module

This is what the eigenmatrix.black.na looks like

ADD REPLY
0
Entering edit mode

When you are asking for help, the idea is to make it as easy as possible for people to find the answer. Saying you need help with "a module in WGCNA" does not make it easy. Specifying the command line you used would at least give people a hint.

ADD REPLY
0
Entering edit mode

Modules are the standard output of any WGCNA analyses. Hence used that word assuming familiarity with such output.

ADD REPLY
0
Entering edit mode
3.8 years ago

The nature of principle components is that all genes will be involved. If you mean to pick out the top5, be sure they're scaled or you'll just get the highest expression genes. The Principle Components analysis must have the genes factors, and you'd have to pick out the ones with max absolute value coefficient. Not sure about WCGNA, this is for plain PCA, which we presume is inside WCGNA.

ADD COMMENT
0
Entering edit mode

The Principle Components analysis must have the genes factors, and you'd have to pick out the ones with max absolute value coefficient.

Could you clarify the above line? How would I go about doing that?

ADD REPLY
0
Entering edit mode

He is referring [I think] to the component / variable loadings. I am not sure that WGCNA supports that, but my own package, PCAtools, does: https://github.com/kevinblighe/PCAtools

ADD REPLY
0
Entering edit mode

Thank you, Kevin. Would your package help me identify and extract which genes in a module contribute to the Eigengene?

ADD REPLY
0
Entering edit mode

Depends on how you use it. To mix PCAtools with WGCNA's module output could be time-consuming (for me, to find out how to do it). I am sure that there is a way via WGCNA though. By the way, I used to work in your lab - you could ask Professor Lasky-Su's postdoc (now Investigator) and she'll probably know a way.

ADD REPLY
0
Entering edit mode

Ah I see-which one, though? We have a quite a few with a similar trajectory. Anyone else who you might be able to suggest?

ADD REPLY
0
Entering edit mode

Could ask Hooman in 'The Cave', too. Otherwise, I may take a look tomorrow. Also, perhaps the answer by swbarnes2 helps(?)

ADD REPLY
0
Entering edit mode

I think that this is what you need, no? - https://rdrr.io/cran/WGCNA/man/moduleEigengenes.html

ADD REPLY
1
Entering edit mode

haha The Cave is where it all happens, to be sure. Yes you are absolutely right with the moduleEigengenes which was the first thing I tried. However, it only gives the 1st PC which is an abstract value. I want the actual genes (and their values of course) that contribute to give that abstract value.

ADD REPLY
0
Entering edit mode

Indeed, my desk was the first on the left as you walk in the door. Was there in 2016/7.

I want the actual genes (and their values of course) that contribute to give that abstract value.

Isn't this just the original expression value, in that case? WGCNA first identifies the modules by constructing a correlation network, and then it performs PCA on the genes in each module, reporting thereafter the loadings for just the first PC (PC1) from each module. This is and is not a good approach, but this is one of the limitations of WGCNA.

ADD REPLY
0
Entering edit mode
3.8 years ago

Now that you have code

pZ <- prcomp(X_black, rank. = 3) #to get 1st 3 components

So you are just using prcomp, which is part of base R.

pZ$rotation will get you the loadings for the genes for each PC

https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/prcomp

ADD COMMENT
0
Entering edit mode

ok so could you confirm what my input matrix should be?

ADD REPLY
0
Entering edit mode
3.8 years ago

Hi reara,

With signed netwroks, In WGCNA the genes that mostly contributes to the 1st PC are the ones with the highest Module Membership or Intramodulat Connectivity

For Unsigneed networks, the genes that mostly contributes to the 1st PC are the ones with the highest absolute value of Module Membership or Intramodulat Connectivity

ADD COMMENT
0
Entering edit mode

Thank you so much-I did try that before and this is the code and output since mine is an unsigned network. How would I find out the actual genes from below result in a tabular format:

intramodularconnectivity = intramodularConnectivity.fromExpr(datExpr, moduleColors, 
                              corFnc = "cor", corOptions = "use = 'p'",
                              weights = NULL,
                              distFnc = "dist", distOptions = "method = 'euclidean'",
                              power = 6,
                              scaleByMax = FALSE,
                              ignoreColors = if (is.numeric(colors)) 0 else "grey",
                              getWholeNetworkConnectivity = TRUE)

enter image description here

ADD REPLY
0
Entering edit mode

The rownames of intramodularconnectivity should have the gene_id. Could you attach the chunck of code used to prepare datExpr?

ADD REPLY
0
Entering edit mode

Just following the Horvath standard tutorial:

datExpr0 = as.data.frame(t(femData[, -c(1:8)]));
names(datExpr0) = femData$substanceBXH;
rownames(datExpr0) = names(femData)[-c(1:8)];

# Plot a line to show the cut
abline(h = 7, col = "red");
# Determine cluster under the line
clust = cutreeStatic(sampleTree, cutHeight = 7, minSize = 10)
table(clust)
# clust 1 contains the samples we want to keep.
keepSamples = (clust==1)
datExpr = datExpr0[keepSamples, ]
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
ADD REPLY

Login before adding your answer.

Traffic: 2724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6