Question: co-expression analysis from a scRNA-seq data
0
gravatar for hubenxia123
4 months ago by
hubenxia1230 wrote:

I have downloaded a public expression matrix for a scRNA-seq. Does anyone know how to perform Gene-Gene Co-expression, like this paper Molecular Diversity and Specializations among the Cells of the Adult Mouse Brain. Best,

rna-seq • 445 views
ADD COMMENTlink modified 4 months ago by Kevin Blighe56k • written 4 months ago by hubenxia1230
0
gravatar for Kevin Blighe
4 months ago by
Kevin Blighe56k
Kevin Blighe56k wrote:

Hey,

You can read the methods of the work that you cite, and, in that way, follow what the authors did. Go here and then go to STAR Methods.

The 2 sections within those methods that you will want to review are

  • ICA based analysis and clustering
  • Correlation analysis across cell populations

Kevin

ADD COMMENTlink modified 4 months ago • written 4 months ago by Kevin Blighe56k

if I know how to do that, I wouldn't ask this question.

ADD REPLYlink written 4 months ago by hubenxia1230

Hello, which part, specifically, are you finding it difficult to follow? I took a closer look myself and can deduce the following rough steps to help you get started:

Step 1 - filtering

  • Filter out cells with fewer than 400 expressed genes
  • Filter include highly variable genes across all tissues (you can use your own metrics, if you wish)

Step 2 - ICA (independent component analysis)

  • Convert highly variable gene matrices to Z-scores ("[The] selected genes were then centered and scaled across all cells")
  • Perform ICA using fastICA package in R, configured to output the first 60 components, and performed separately on each tissue.

Step 3 - KNN clustering

Perform clustering on the 60 ICA components using the cluster implementation in Seurat. Basically, re-use Seurat's functions FindNeighbors() and FindClusters(). I use these in a function in a package that I'm currently developing, to give you an idea: https://github.com/kevinblighe/scToolkit/blob/master/R/clusKNN.R

--------------------------

That should bring you up to the line "To identify finer substructure among these classes, classes with more than 200 cells were selected for subclustering", whereby they then commence a second round of ICA on a finer subset of genes, it seems.

Unfortunately, following bioinformatics methods can be a nightmare, because it is impossible to accurately write in English language the minute details that are required to comprise a comprehensive methodology.

ADD REPLYlink modified 4 months ago • written 4 months ago by Kevin Blighe56k

You might also want to take a look at this article to get ideas for alternatives to pearson correlation.

ADD REPLYlink written 4 months ago by kristoffer.vittingseerup3.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1770 users visited in the last hour