Question: WGCNA and SC-RNA Seq data
gravatar for pennakiza
9 months ago by
pennakiza60 wrote:


I have a dataset of single-cell expression data (at the moment working on CD4 cells only) from 4 patients. Would 4 patients be enough to get any significant results, considering that my sample number is essentially 1200 cells?

Thank you in advance!

wgcna rna-seq sc-rna seq • 1.0k views
ADD COMMENTlink modified 9 months ago • written 9 months ago by pennakiza60

Hi Kevin,

I have filtered my dataset for low counts, so I have ended up with ~850 genes, and WGCNA runs quite smoothly but the module-trait correlations that I see are quite weak. I was wondering if that is because I am working with so few genes or because all those cells come from only 4 patients.


ADD REPLYlink written 9 months ago by pennakiza60

Could be a few reasons. So, you have 850 genes x ~1200 cells? I'm still not sure that WGCNA is best for scRNA-seq data, and I believe running WGCNA on PC eigenvectors would be better (as I explain in my answer, below). The cellular heterogeneity that comes with scRNA-seq datasets may be what is 'beating' WGCNA in this case, and also the fact that you are effectively dealing with 4 batches (4 samples), or have you run it on the 'integrated' dataset after adjustment for batch?

You are literally the first person that I have ever heard of using WGCNA on scRNA-seq data.

ADD REPLYlink written 9 months ago by Kevin Blighe70k

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. This comment belongs under Kevin's answer.

SUBMIT ANSWER is for new answers to original question.

ADD REPLYlink modified 9 months ago • written 9 months ago by GenoMax96k
gravatar for Kevin Blighe
9 months ago by
Kevin Blighe70k
Republic of Ireland
Kevin Blighe70k wrote:

To run WGCNA on such a dataset, you will require a lot of RAM, assuming that you want to run it over the entire transcriptome of each cell. Moreover, I question what exactly it would mean when compared to the output of other methods such as tSNE, UMAP, psuedo-time analysis, etc.

None of us can stop you going ahead with this, but I just question what exactly it would mean. The aforementioned data reduction methods were designed specifically to reduce the computational burden of processing and interpreting scRNA-seq data. Thus, it may make more sense to run WGCNA on a certain number of principal components that account for an appreciable amount of explained variation, like > 80%.


ADD COMMENTlink written 9 months ago by Kevin Blighe70k

Actually, the computational expense is not that high, especially if the adjacency matrix is filtered to remove genes with a low variability and/or expression level.

The additional information would be to see the "wiring" of the gene expression network,in different clusters, and identification of potential key driver genes.

ADD REPLYlink written 7 months ago by thomas.mohr0

Of course.

ADD REPLYlink written 7 months ago by Kevin Blighe70k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1393 users visited in the last hour