Question

WGCNA gene selection: gene significance or LASSO?

0

Entering edit mode

14 months ago

janinubinu • 0

Hello,

I am having some trouble understanding how to choose a method for gene selection to identify genes from top WGCNA modules that most correlate with a clinical trait. I understand that WGCNA has a built-in feature called gene significance (GS), which is just the Pearson correlation between the gene and the clinical trait, and you could set a GS cut-off to identify genes from the modules that most correlate with your clinical trait.

However, I understand that you can also use LASSO for this, where you identify genes that have non-zero coefficients. From what I have read so far, LASSO does remove genes that DO correlate with the clinical trait but have a weak correlation or are redundant in that they share similar correlation pattern with another gene. My understanding is that LASSO is better for when you have a large number of genes.

Am I missing anything? Why would one use LASSO over a GS cut-off for identifying top genes?

LASSO WGCNA • 938 views

ADD COMMENT • link updated 14 months ago by rpolicastro 13k • written 14 months ago by janinubinu • 0

2

Entering edit mode

I tend to lean towards more network-based centrality approaches (like kleinbergs or betweenness) when trying to find influential genes in a module. As an intuitive way to approach it, centrality measures would find genes that are "most connected" to other genes in the module. Here's a brief example using igraph that is conceptually similar to the WGCNA function chooseTopHubInEachModule (which uses the adjacency matrix instead of the TOM, and a different centrality measure than Kleinbergs). I prefer exporting it to igraph to get more control over the analysis and to have more plotting options.

tom is your topological overlap matrix, and module_features are the genes in your correlated module.

library("igraph")

# Construct a network using your TOM.
network <- tom |>
  graph_from_adjacency_matrix(mode="upper", weighted=TRUE) |>
  simplify()

# Return the top 5 genes by Kleinberg's hub score.
top_hubs <-
  hub_score(network)$vector[module_features] |>
  sort(decreasing=TRUE) |>
  head(n=5)

Igraph has a whole bunch of network measures to choose from if you prefer a different method.

ADD REPLY • link 14 months ago by rpolicastro 13k

1

Entering edit mode

LASSO does remove genes that DO correlate with the clinical trait but have a weak correlation or are redundant in that they share similar correlation pattern with another gene. My understanding is that LASSO is better for when you have a large number of genes.

Disclaimer, I am not familiar with LASSO but... if LASSO really drops a gene just because it shares a similar correlation pattern with another gene then, I don't think this will work very well in this context. Genes in modules, especially the hub genes are supposed to share similar correlation patterns between each other. That is how WGCNA works.

Maybe this post could be of some help

ADD REPLY • link 14 months ago by andres.firrincieli 3.6k