WGCNA and SC-RNA Seq data
2.0 years ago
Hello,

I have a dataset of single-cell expression data (at the moment working on CD4 cells only) from 4 patients. Would 4 patients be enough to get any significant results, considering that my sample number is essentially 1200 cells?

Hi Kevin,

I have filtered my dataset for low counts, so I have ended up with ~850 genes, and WGCNA runs quite smoothly but the module-trait correlations that I see are quite weak. I was wondering if that is because I am working with so few genes or because all those cells come from only 4 patients.

Penny

Could be a few reasons. So, you have 850 genes x ~1200 cells? I'm still not sure that WGCNA is best for scRNA-seq data, and I believe running WGCNA on PC eigenvectors would be better (as I explain in my answer, below). The cellular heterogeneity that comes with scRNA-seq datasets may be what is 'beating' WGCNA in this case, and also the fact that you are effectively dealing with 4 batches (4 samples), or have you run it on the 'integrated' dataset after adjustment for batch?

You are literally the first person that I have ever heard of using WGCNA on scRNA-seq data.

It's enough to me. At some point, each cell can be used as a sample

To run WGCNA on such a dataset, you will require a lot of RAM, assuming that you want to run it over the entire transcriptome of each cell. Moreover, I question what exactly it would mean when compared to the output of other methods such as tSNE, UMAP, psuedo-time analysis, etc.

None of us can stop you going ahead with this, but I just question what exactly it would mean. The aforementioned data reduction methods were designed specifically to reduce the computational burden of processing and interpreting scRNA-seq data. Thus, it may make more sense to run WGCNA on a certain number of principal components that account for an appreciable amount of explained variation, like > 80%.

Kevin

Actually, the computational expense is not that high, especially if the adjacency matrix is filtered to remove genes with a low variability and/or expression level.

The additional information would be to see the "wiring" of the gene expression network,in different clusters, and identification of potential key driver genes.

Of course.