GO Terms - most common between - many gene lists (from 36 scRNAseq clusters)
1
1
Entering edit mode
2.1 years ago

Hi all,

I would like to try and determine an overarching 'theme' of what is going on in 36 differential expressed gene lists (from 36 single cell RNAseq clusters -- about half from one age, and half from another).

I have GO terms for each of the 36 gene lists from the clusters - generated using goana (limma, R). But now I want to see what are the most common GO terms between them.

The idea is to get an overarching idea of the effect of our treatment is having on the organ we're studying. Doing it by cluster, rather than lumping all DEGs for all scRNAseq clusters together and generating a GO term list, gives us a better representation of how our treatment is generally affecting the organ. If we were to combine all the DEGs for all clusters, we'd get one GO term list, however a gene that may otherwise be represented in every cluster (and influence GO terms in all clusters) will only be represented once in a combined gene list.

So, can anyone help out? I've been searching extensively, and am yet to find any turn-key packages or vignettes on how it might be done in R.

many thanks, K

RNA-Seq next-gen • 525 views
0
Entering edit mode

Can you give an example for what one of your GO results look like?

0
Entering edit mode

Thanks - pasted below.

0
Entering edit mode
Term                                        Ont N   Up  Down    P.Up        P.Down      ID
ion homeostasis                             BP  16  0   11      1           0.000282    GO:0050801
cation homeostasis                          BP  12  0   9       1           0.000452    GO:0055080
inorganic ion homeostasis                   BP  12  0   9       1           0.000452    GO:0098771
metal ion homeostasis                       BP  12  0   9       1           0.000452    GO:0055065
homeostatic process                         BP  29  0   15      1           0.001418    GO:0042592
renal system development                    BP  11  0   8       1           0.001479    GO:0072001
urogenital system development               BP  12  0   8       1           0.003534    GO:0001655
kidney development                          BP  10  0   7       1           0.004578    GO:0001822
chemical homeostasis                        BP  23  0   12      1           0.00512     GO:0048878
regulation of mRNA metabolic process        BP  6   2   0       0.005865    1           GO:1903311
RNA splicing                                BP  6   2   0       0.005865    1           GO:0008380
monovalent inorganic cation homeostasis     BP  6   0   5       1           0.006144    GO:0055067
nephron development                         BP  6   0   5       1           0.006144    GO:0072006
sodium ion homeostasis                      BP  6   0   5       1           0.006144    GO:0055078
mRNA processing                             BP  7   2   0       0.008165    1           GO:0006397
mRNA metabolic process                      BP  8   2   0       0.010825    1           GO:0016071
RNA processing                              BP  8   2   0       0.010825    1           GO:0006396
cellular cation homeostasis                 BP  9   0   6       1           0.013334    GO:0030003
cellular ion homeostasis                    BP  9   0   6       1           0.013334    GO:0006873
cellular metal ion homeostasis              BP  9   0   6       1           0.013334    GO:0006875


Thanks @rpolicastro this is the top 20 from one of the clusters. It was generated using goana from the R limma package.

0
Entering edit mode

I'm not sure how your data is being stored now, but suppose you have a list of GO results for each cluster in the variable go_results.

Here's a tidyverse solution to count the total occurrence of each term in all sets of data.

library("tidyverse")

counts <- go_results %>%
bind_rows %>%
count(Term, sort=TRUE)


If you explain how the GO results are being stored now I can update the answer with more specific details.

0
Entering edit mode
2.0 years ago
seidel 11k

Why not make a heat map where columns represent your scRNA Seq clusters, and the rows represent GO terms, and the value in your matrix is an enrichment score for the GO term in the cluster. A term highly enriched across clusters will be easily visible in this way. Your enrichment score could be a representation of an observed/expected gene count, or a p-value for the term in the cluster (perhaps -log transformed).