Question: Extracting Genes that contribute to 1st PC
0
gravatar for reara
7 weeks ago by
reara10
reara10 wrote:

I am trying to figure out how to extract the genes that contribute to the 1st Principal Component of a Module in WGCNA. Any advice on methodology would be much appreciated.

rna-seq pca wgcna gene • 237 views
ADD COMMENTlink modified 5 weeks ago by andres.firrincieli710 • written 7 weeks ago by reara10

Not too familiar with this particular part of WGCNA; however, please show relevant code that you have used and also run str() on your key objects (and paste the output here).

ADD REPLYlink written 7 weeks ago by Kevin Blighe63k

`eigenmatrix.black.na <- t(na.omit(t(eigenmatrix.black)))

df_black <- read.csv('eigengmatrix.black.na.csv', header = TRUE)

X_black <- df_black

X_black <- as.matrix(df_black)

nPC=1

n=dim(X_black)[1]

p=dim(X_black)[2]

svd1 = svd(X_black, nu = min(n, p, nPC), nv = min(n, p, nPC))

svd1$d

PCA

pZ <- prcomp(X_black, rank. = 3) #to get 1st 3 components

summary(pZ) pZ$x #to get components`

below is the eigenmatrix.black.na which consists of the genes and expression values of all the genes from the black module

This is what the eigenmatrix.black.na looks like

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by reara10

When you are asking for help, the idea is to make it as easy as possible for people to find the answer. Saying you need help with "a module in WGCNA" does not make it easy. Specifying the command line you used would at least give people a hint.

ADD REPLYlink written 7 weeks ago by swbarnes28.2k

Modules are the standard output of any WGCNA analyses. Hence used that word assuming familiarity with such output.

ADD REPLYlink written 7 weeks ago by reara10
0
gravatar for karl.stamm
7 weeks ago by
karl.stamm3.7k
United States
karl.stamm3.7k wrote:

The nature of principle components is that all genes will be involved. If you mean to pick out the top5, be sure they're scaled or you'll just get the highest expression genes. The Principle Components analysis must have the genes factors, and you'd have to pick out the ones with max absolute value coefficient. Not sure about WCGNA, this is for plain PCA, which we presume is inside WCGNA.

ADD COMMENTlink written 7 weeks ago by karl.stamm3.7k

The Principle Components analysis must have the genes factors, and you'd have to pick out the ones with max absolute value coefficient.

Could you clarify the above line? How would I go about doing that?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by reara10

He is referring [I think] to the component / variable loadings. I am not sure that WGCNA supports that, but my own package, PCAtools, does: https://github.com/kevinblighe/PCAtools

ADD REPLYlink written 5 weeks ago by Kevin Blighe63k

Thank you, Kevin. Would your package help me identify and extract which genes in a module contribute to the Eigengene?

ADD REPLYlink written 5 weeks ago by reara10

Depends on how you use it. To mix PCAtools with WGCNA's module output could be time-consuming (for me, to find out how to do it). I am sure that there is a way via WGCNA though. By the way, I used to work in your lab - you could ask Professor Lasky-Su's postdoc (now Investigator) and she'll probably know a way.

ADD REPLYlink written 5 weeks ago by Kevin Blighe63k

Ah I see-which one, though? We have a quite a few with a similar trajectory. Anyone else who you might be able to suggest?

ADD REPLYlink written 5 weeks ago by reara10

Could ask Hooman in 'The Cave', too. Otherwise, I may take a look tomorrow. Also, perhaps the answer by swbarnes2 helps(?)

ADD REPLYlink written 5 weeks ago by Kevin Blighe63k

I think that this is what you need, no? - https://rdrr.io/cran/WGCNA/man/moduleEigengenes.html

ADD REPLYlink written 5 weeks ago by Kevin Blighe63k
1

haha The Cave is where it all happens, to be sure. Yes you are absolutely right with the moduleEigengenes which was the first thing I tried. However, it only gives the 1st PC which is an abstract value. I want the actual genes (and their values of course) that contribute to give that abstract value.

ADD REPLYlink written 5 weeks ago by reara10

Indeed, my desk was the first on the left as you walk in the door. Was there in 2016/7.

I want the actual genes (and their values of course) that contribute to give that abstract value.

Isn't this just the original expression value, in that case? WGCNA first identifies the modules by constructing a correlation network, and then it performs PCA on the genes in each module, reporting thereafter the loadings for just the first PC (PC1) from each module. This is and is not a good approach, but this is one of the limitations of WGCNA.

ADD REPLYlink written 5 weeks ago by Kevin Blighe63k
0
gravatar for swbarnes2
5 weeks ago by
swbarnes28.2k
United States
swbarnes28.2k wrote:

Now that you have code

pZ <- prcomp(X_black, rank. = 3) #to get 1st 3 components

So you are just using prcomp, which is part of base R.

pZ$rotation will get you the loadings for the genes for each PC

https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/prcomp

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by swbarnes28.2k

ok so could you confirm what my input matrix should be?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by reara10
0
gravatar for andres.firrincieli
5 weeks ago by
andres.firrincieli710 wrote:

Hi reara,

With signed netwroks, In WGCNA the genes that mostly contributes to the 1st PC are the ones with the highest Module Membership or Intramodulat Connectivity

For Unsigneed networks, the genes that mostly contributes to the 1st PC are the ones with the highest absolute value of Module Membership or Intramodulat Connectivity

ADD COMMENTlink written 5 weeks ago by andres.firrincieli710

Thank you so much-I did try that before and this is the code and output since mine is an unsigned network. How would I find out the actual genes from below result in a tabular format:

intramodularconnectivity = intramodularConnectivity.fromExpr(datExpr, moduleColors, 
                              corFnc = "cor", corOptions = "use = 'p'",
                              weights = NULL,
                              distFnc = "dist", distOptions = "method = 'euclidean'",
                              power = 6,
                              scaleByMax = FALSE,
                              ignoreColors = if (is.numeric(colors)) 0 else "grey",
                              getWholeNetworkConnectivity = TRUE)

enter image description here

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by reara10

The rownames of intramodularconnectivity should have the gene_id. Could you attach the chunck of code used to prepare datExpr?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by andres.firrincieli710

Just following the Horvath standard tutorial:

datExpr0 = as.data.frame(t(femData[, -c(1:8)]));
names(datExpr0) = femData$substanceBXH;
rownames(datExpr0) = names(femData)[-c(1:8)];

# Plot a line to show the cut
abline(h = 7, col = "red");
# Determine cluster under the line
clust = cutreeStatic(sampleTree, cutHeight = 7, minSize = 10)
table(clust)
# clust 1 contains the samples we want to keep.
keepSamples = (clust==1)
datExpr = datExpr0[keepSamples, ]
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by reara10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 755 users visited in the last hour