Question

1 feasible gene out of 33,602 available genes in topGOdata object

0

Entering edit mode

2.0 years ago

liyong ▴ 80

Hello all,

I am trying to use topGO in R to do GO analysis. For genelist, I collect all Arabidopsis genes from (https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_gene_lists/TAIR10_representative_gene_models) to define the gene universe.

Then I run GOdata = new("topGOdata", ontology = "BP", allGenes = genelist, annot = annFUN.org, mapping = "org.At.tair.db"), it gives me an error Nothing to do: Error in split.default(names(sort(nl)), f.index) : first argument must be a vector. After I google around, I found this thread (https://support.bioconductor.org/p/132621/). Following the suggestion, I re-run GOdata = new("topGOdata", ontology = "BP", allGenes = genelist, annot = annFUN.org, mapping = "org.At.tair.db", ID = "symbol" by adding ID = "symbol", which looks good.

However, when I check the GOdata, it shows : enter image description here

So there is ONLY 1 feasible gene out of 33602 available genes, which is quite weird.

After going through the tutorial (https://www.bioconductor.org/packages/devel/bioc/vignettes/topGO/inst/doc/topGO.pdf), I found that annFUN.org function is using the mappings from the "org.XX.XX" annotation packages. Currently, the function supports the following gene identifiers: Entrez, GenBank, Alias, Ensembl, Gene Symbol, GeneName and UniGene..

So I try these different gene identifiers (e.g with "Ensembl" GOdata = new("topGOdata", ontology = "BP", allGenes = genelist, annot = annFUN.org, mapping = "org.At.tair.db", ID = "Ensembl"). However, I got another error Building most specific GOs ..... Error: no such table: ensembl.

Can anyone help me troubleshoot this, any suggestions will be appreciated.

Thanks a lot!

topGO GO • 623 views

ADD COMMENT • link 2.0 years ago by liyong ▴ 80

score 1 · Accepted Answer · 2022-11-04

1

Entering edit mode

2.0 years ago

liyong ▴ 80

The problem turns out to be the format of my gene IDs.

The gene ID format I used contains transcript numbers (e.g. "AT4G37770.1" or similar), which should be called transcript ID I think. After I remove the transcript numbers (e.g. change "AT4G37770.1" to "AT4G37770"). It looks all good for now.

ADD COMMENT • link 2.0 years ago by liyong ▴ 80