Gene ontology with locus tags
0
0
Entering edit mode
20 months ago
sea.joson ▴ 10

Hello!

I apologize if this seems like a silly question, but I haven't had success in finding answers yet. I have transcriptomes that I did differential expression with, and one thing I'd really like to do is do some gene ontology with them. I've seen a labmate do it before and from my perspective, it seemed like simply pasting the list of genes onto a search engine and coming up with the different GO terms.

My problem is that all my data are in locus tags which aren't being recognized by DAVID or Pantherdb and such. My question is, how do I proceed? Do I have to convert my list of genes somehow? Any insight would be appreciated!

locus tags examples: "G4B11_000096 G4B11_000117 G4B11_000118 G4B11_000119 G4B11_000120 G4B11_000135 G4B11_000139"

ontology locus gene tags • 1.3k views
ADD COMMENT
1
Entering edit mode

There are many tools that can work. One I particularly like is g:Profiler2, which I use in R. You can create a custom GO universe to compare against, which makes using non-model organisms easy. So long as you can associate the locus tags you have with GO terms in something like a GFF, you should be fine. There are plenty of tutorials out there.

edit: I don't see any way to not identify which genes are present in those loci, GOs are associated with transcripts, not genomic loci.

ADD REPLY
1
Entering edit mode

This is the right idea. GO annotations describe the natural function of a particular gene, and therefore GO annotations are tied to gene products or protein complexes. GO defines gene products as 'any genes and entities encoded by the gene, e.g. proteins and functional RNAs'. Some cross references can be found from our FAQ on IDs.

For PANTHER, the supported IDs are listed at www.pantherdb.org/tips/tips_batchIdSearch_supportedId.jsp:

  • Ensembl: Ensembl gene identifier. Example: "ENSG00000126243"
    • Ensembl_PRO: Ensembl protein identifier. Example: "ENSP00000337383"
    • Ensembl_TRS: Ensembl transcript identifier. "Example: ENST00000391828"
    • Gene ID: EntrezGene IDs. examples include, "GeneID:10203", "10203" (for Entrez gene GeneID:10203) -Gene symbol: for example, "CALCA"
    • GI: NCBI GI numbers. Example: "16033597" -HGNC: HUGO Gene Nomenclature ids. Example: "HGNC:16673"
    • IPI: International Protein Index ids. Example: "IPI00740702"
    • UniGene: NCBI UniGene ids. Examples: "Hs.654587", "At.36040"
    • UniProtKB:UniProt accession. Example: "O80536"
    • UniProtKB-ID: UniProt ID. Example: "AGAP3_HUMAN"
ADD REPLY
0
Entering edit mode

Can you expound on creating a custom GO universe? Googling it didn't really yield too many hits. I do have a GFF file, but I still need to check out some tutorials to see how I can move forward.

ADD REPLY
0
Entering edit mode

A universe describes the collection of GOs you want to test against. GO enrichment test checks whether any GO terms are overrepresented in your outlier genes in comparison to the universe. The universe is typically the entire transcriptome of the genome assembly you are using.

Tools like TopGO use org.db packages (you can find them on bioconductor) for each species that have all the annotations ready, but the supported species list is pretty limited. If you're using a non-model species, you're stuck using tools like g:Profiler. See the vignette here.

ADD REPLY

Login before adding your answer.

Traffic: 1713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6