I would like to try and determine an overarching 'theme' of what is going on in 36 differential expressed gene lists (from 36 single cell RNAseq clusters -- about half from one age, and half from another).
I have GO terms for each of the 36 gene lists from the clusters - generated using goana (limma, R). But now I want to see what are the most common GO terms between them.
The idea is to get an overarching idea of the effect of our treatment is having on the organ we're studying. Doing it by cluster, rather than lumping all DEGs for all scRNAseq clusters together and generating a GO term list, gives us a better representation of how our treatment is generally affecting the organ. If we were to combine all the DEGs for all clusters, we'd get one GO term list, however a gene that may otherwise be represented in every cluster (and influence GO terms in all clusters) will only be represented once in a combined gene list.
So, can anyone help out? I've been searching extensively, and am yet to find any turn-key packages or vignettes on how it might be done in R.
many thanks, K
Can you give an example for what one of your GO results look like?
Thanks - pasted below.
Thanks @rpolicastro this is the top 20 from one of the clusters. It was generated using goana from the R limma package.
I'm not sure how your data is being stored now, but suppose you have a list of GO results for each cluster in the variable
Here's a tidyverse solution to count the total occurrence of each term in all sets of data.
If you explain how the GO results are being stored now I can update the answer with more specific details.