Hello all,
I am trying to use topGO in R to do GO analysis. For genelist, I collect all Arabidopsis genes from (https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_gene_lists/TAIR10_representative_gene_models) to define the gene universe.
Then I run GOdata = new("topGOdata", ontology = "BP", allGenes = genelist, annot = annFUN.org, mapping = "org.At.tair.db")
, it gives me an error Nothing to do: Error in split.default(names(sort(nl)), f.index) : first argument must be a vector
. After I google around, I found this thread (https://support.bioconductor.org/p/132621/). Following the suggestion, I re-run GOdata = new("topGOdata", ontology = "BP", allGenes = genelist, annot = annFUN.org, mapping = "org.At.tair.db", ID = "symbol"
by adding ID = "symbol"
, which looks good.
However, when I check the GOdata, it shows :
So there is ONLY 1 feasible gene out of 33602 available genes, which is quite weird.
After going through the tutorial (https://www.bioconductor.org/packages/devel/bioc/vignettes/topGO/inst/doc/topGO.pdf), I found that annFUN.org function is using the mappings from the "org.XX.XX" annotation packages. Currently, the function supports the following gene identifiers: Entrez, GenBank, Alias, Ensembl, Gene Symbol, GeneName and UniGene.
.
So I try these different gene identifiers (e.g with "Ensembl" GOdata = new("topGOdata", ontology = "BP", allGenes = genelist, annot = annFUN.org, mapping = "org.At.tair.db", ID = "Ensembl")
. However, I got another error Building most specific GOs ..... Error: no such table: ensembl
.
Can anyone help me troubleshoot this, any suggestions will be appreciated.
Thanks a lot!