I apologize if this seems like a silly question, but I haven't had success in finding answers yet. I have transcriptomes that I did differential expression with, and one thing I'd really like to do is do some gene ontology with them. I've seen a labmate do it before and from my perspective, it seemed like simply pasting the list of genes onto a search engine and coming up with the different GO terms.
My problem is that all my data are in locus tags which aren't being recognized by DAVID or Pantherdb and such. My question is, how do I proceed? Do I have to convert my list of genes somehow? Any insight would be appreciated!
locus tags examples: "G4B11_000096 G4B11_000117 G4B11_000118 G4B11_000119 G4B11_000120 G4B11_000135 G4B11_000139"
There are many tools that can work. One I particularly like is g:Profiler2, which I use in R. You can create a custom GO universe to compare against, which makes using non-model organisms easy. So long as you can associate the locus tags you have with GO terms in something like a GFF, you should be fine. There are plenty of tutorials out there.
edit: I don't see any way to not identify which genes are present in those loci, GOs are associated with transcripts, not genomic loci.
This is the right idea. GO annotations describe the natural function of a particular gene, and therefore GO annotations are tied to gene products or protein complexes. GO defines gene products as 'any genes and entities encoded by the gene, e.g. proteins and functional RNAs'. Some cross references can be found from our FAQ on IDs.
For PANTHER, the supported IDs are listed at www.pantherdb.org/tips/tips_batchIdSearch_supportedId.jsp:
Can you expound on creating a custom GO universe? Googling it didn't really yield too many hits. I do have a GFF file, but I still need to check out some tutorials to see how I can move forward.
A universe describes the collection of GOs you want to test against. GO enrichment test checks whether any GO terms are overrepresented in your outlier genes in comparison to the universe. The universe is typically the entire transcriptome of the genome assembly you are using.
Tools like TopGO use org.db packages (you can find them on bioconductor) for each species that have all the annotations ready, but the supported species list is pretty limited. If you're using a non-model species, you're stuck using tools like g:Profiler. See the vignette here.