Question

Gene ontology with locus tags

0

Entering edit mode

21 months ago

sea.joson ▴ 10

Hello!

I apologize if this seems like a silly question, but I haven't had success in finding answers yet. I have transcriptomes that I did differential expression with, and one thing I'd really like to do is do some gene ontology with them. I've seen a labmate do it before and from my perspective, it seemed like simply pasting the list of genes onto a search engine and coming up with the different GO terms.

My problem is that all my data are in locus tags which aren't being recognized by DAVID or Pantherdb and such. My question is, how do I proceed? Do I have to convert my list of genes somehow? Any insight would be appreciated!

locus tags examples: "G4B11_000096 G4B11_000117 G4B11_000118 G4B11_000119 G4B11_000120 G4B11_000135 G4B11_000139"

ontology locus gene tags • 1.3k views

ADD COMMENT • link updated 21 months ago by dthorbur ★ 2.5k • written 21 months ago by sea.joson ▴ 10

1

Entering edit mode

There are many tools that can work. One I particularly like is g:Profiler2, which I use in R. You can create a custom GO universe to compare against, which makes using non-model organisms easy. So long as you can associate the locus tags you have with GO terms in something like a GFF, you should be fine. There are plenty of tutorials out there.

edit: I don't see any way to not identify which genes are present in those loci, GOs are associated with transcripts, not genomic loci.

ADD REPLY • link 21 months ago by dthorbur ★ 2.5k

1

Entering edit mode

This is the right idea. GO annotations describe the natural function of a particular gene, and therefore GO annotations are tied to gene products or protein complexes. GO defines gene products as 'any genes and entities encoded by the gene, e.g. proteins and functional RNAs'. Some cross references can be found from our FAQ on IDs.

For PANTHER, the supported IDs are listed at www.pantherdb.org/tips/tips_batchIdSearch_supportedId.jsp:

Ensembl: Ensembl gene identifier. Example: "ENSG00000126243"
- Ensembl_PRO: Ensembl protein identifier. Example: "ENSP00000337383"
- Ensembl_TRS: Ensembl transcript identifier. "Example: ENST00000391828"
- Gene ID: EntrezGene IDs. examples include, "GeneID:10203", "10203" (for Entrez gene GeneID:10203) -Gene symbol: for example, "CALCA"
- GI: NCBI GI numbers. Example: "16033597" -HGNC: HUGO Gene Nomenclature ids. Example: "HGNC:16673"
- IPI: International Protein Index ids. Example: "IPI00740702"
- UniGene: NCBI UniGene ids. Examples: "Hs.654587", "At.36040"
- UniProtKB:UniProt accession. Example: "O80536"
- UniProtKB-ID: UniProt ID. Example: "AGAP3_HUMAN"

ADD REPLY • link 21 months ago by geneontologyhelp ▴ 420

0

Entering edit mode

Can you expound on creating a custom GO universe? Googling it didn't really yield too many hits. I do have a GFF file, but I still need to check out some tutorials to see how I can move forward.

ADD REPLY • link 21 months ago by sea.joson ▴ 10

0

Entering edit mode

A universe describes the collection of GOs you want to test against. GO enrichment test checks whether any GO terms are overrepresented in your outlier genes in comparison to the universe. The universe is typically the entire transcriptome of the genome assembly you are using.

Tools like TopGO use org.db packages (you can find them on bioconductor) for each species that have all the annotations ready, but the supported species list is pretty limited. If you're using a non-model species, you're stuck using tools like g:Profiler. See the vignette here.

ADD REPLY • link 21 months ago by dthorbur ★ 2.5k