Question: co-expression analysis from a scRNA-seq data
gravatar for hubenxia123
4 months ago by
hubenxia1230 wrote:

I have downloaded a public expression matrix for a scRNA-seq. Does anyone know how to perform Gene-Gene Co-expression, like this paper Molecular Diversity and Specializations among the Cells of the Adult Mouse Brain. Best,

rna-seq • 445 views
ADD COMMENTlink modified 4 months ago by Kevin Blighe56k • written 4 months ago by hubenxia1230
gravatar for Kevin Blighe
4 months ago by
Kevin Blighe56k
Kevin Blighe56k wrote:


You can read the methods of the work that you cite, and, in that way, follow what the authors did. Go here and then go to STAR Methods.

The 2 sections within those methods that you will want to review are

  • ICA based analysis and clustering
  • Correlation analysis across cell populations


ADD COMMENTlink modified 4 months ago • written 4 months ago by Kevin Blighe56k

if I know how to do that, I wouldn't ask this question.

ADD REPLYlink written 4 months ago by hubenxia1230

Hello, which part, specifically, are you finding it difficult to follow? I took a closer look myself and can deduce the following rough steps to help you get started:

Step 1 - filtering

  • Filter out cells with fewer than 400 expressed genes
  • Filter include highly variable genes across all tissues (you can use your own metrics, if you wish)

Step 2 - ICA (independent component analysis)

  • Convert highly variable gene matrices to Z-scores ("[The] selected genes were then centered and scaled across all cells")
  • Perform ICA using fastICA package in R, configured to output the first 60 components, and performed separately on each tissue.

Step 3 - KNN clustering

Perform clustering on the 60 ICA components using the cluster implementation in Seurat. Basically, re-use Seurat's functions FindNeighbors() and FindClusters(). I use these in a function in a package that I'm currently developing, to give you an idea:


That should bring you up to the line "To identify finer substructure among these classes, classes with more than 200 cells were selected for subclustering", whereby they then commence a second round of ICA on a finer subset of genes, it seems.

Unfortunately, following bioinformatics methods can be a nightmare, because it is impossible to accurately write in English language the minute details that are required to comprise a comprehensive methodology.

ADD REPLYlink modified 4 months ago • written 4 months ago by Kevin Blighe56k

You might also want to take a look at this article to get ideas for alternatives to pearson correlation.

ADD REPLYlink written 4 months ago by kristoffer.vittingseerup3.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1770 users visited in the last hour