Question

Co-expression analysis of single cell RNA seq data to identify pathway associations

0

Entering edit mode

5.1 years ago

james.drew101 • 0

Hi there. Sorry in advance for the lengthy background - I think it's necessary.

My background is in wet lab biology so priming you that my biology >> bioinformatics understanding - apologies for any obvious blunders. I've got some experience coding for image analysis in MATLAB + Python but am a hardcore R-based bioinformatics noob.

I'm wanting to explore whether scRNA data can be used to identify functional pathways that are co-expressed in individual cells. I've found lots of papers that do this (e.g. https://www.frontiersin.org/articles/10.3389/fgene.2019.00953/full ), and Weighted Gene Co-expression Network Analysis (WGCNA) seems like a common approach in the field. However, these mostly use an unbiased approach, and (as I understand it) are making comparisons between pairs of genes to generate pathway clusters de novo rather than looking at pre-existing pathways (e.g. Gene Ontology, Reactome).

I want apply this sort of analysis to look at a specific pathway and see which other pathways co-express with it within a single cell type population. E.g. cancer cells are heterogeneous and there will be inter-cell variability in the expression of, say, glucose metabolism genes (assume this can be identified as a gene set). It would be interesting to know if this variability in glucose metabolism was correlated with changes in other fucntional pathways, suggesting that these processes might be mechanistically linked.

I'd really like any advice/comments/thoughts on (1) whether this approach makes sense, bioinformatically and (2) how best to approach it. Some things I'm unsure on are

How to normalise expression values (e.g. FKPM) for this
Whether gene-to-gene vs pathway-to-pathway co-expression comparisons are best
How to statistically deal with this more biased approach (explicitly looking at a particualr pathway and seeing what co-expesses) compared to tradiational unbiased analyses.

Thanks in advance for your help! Happy to add any clarifications

RNA-Seq R co-expression cancer • 2.3k views

ADD COMMENT • link updated 5.1 years ago by Kevin Blighe 89k • written 5.1 years ago by james.drew101 • 0

score 0 · Answer 1 · 2020-06-07

You may want to use GSVA, which is available in R / Bioconductor. It will take your input data-matrix and 'super-impose' or deconvolute it for pre-existing pathways and gene signatures. You can also create your own gene signatures and enrich those, So, if your input is a gene x sample data matrix, the output from GSVA will be sample x pathway/signature.

How to normalise expression values (e.g. FKPM) for this

If FPKM is all that you have, I would transform it to Z-scores via the zFPKM R package.

Whether gene-to-gene vs pathway-to-pathway co-expression comparisons are best

Both are valid and, when I am doing a comprehensive analysis for somebody, I would check both (time-permitting).

How to statistically deal with this more biased approach (explicitly looking at a particualr pathway and seeing what co-expesses) compared to tradiational unbiased analyses.

I do not see it as biased - you could argue that it is better to use pre-defined pathways as opposed to trying to define novel ones. To clarify, though, programs like WGCNA and my own tutorial ( Network plot from expression data in R using igraph ) just identify clusters / modules / communities and these can then be enriched against known pathways.

Kevin