I'm currently conducting a transcriptomics study on Nicotiana benthamiana, and have reached a point where i would like to conduct a GO enrichment analysis using GOseq or topGO. Of course neither of these have N. benthamiana supported natively, so i would have to provide the GO mappings N. benthamiana genes myself. Other studies using the same sequence and annotations have used GOseq and topGO, but do not go into detail in their methods sections how they did this.
Of course i could simply feed a fasta of the DEGs into blast2go, but i don't actually have blast2go, and i feel like i'm missing something.
GOseq takes its gene to category associations as a named listed of vectors, where each vector is the terms associated with each gene, as such, a GAF file isn't neccessarily the end of the solution anyway.
I'd create this object from the annotation txt file.
This seems to work almost perfectly, the functions in brackets next to the GO terms in the functional annotations file are un-helpfully worded, so i will have to use a few lines of gsub to remove bits of leftover text,
Thank you for the help, I'm going through the same problem. What I can't understand is how the gene2cat file should look. In my case, I have a table with the gene ids and the GO terms divided by category (GO_Biological, Go_Cellular, etc) (see picture at bottom). I have done a try using GO_Biological with your tip:
The output looks good (each gene with its bunch of GO terms). However, when I run the following:
I obtain this error:
I don't know how to prepare the gene2cat object properly, in addition, I would like to use all the GO categories at the same time. Do you have any suggestions on how to proceed? Thank you!!
If you are providing your own categories, then the
test.catsparameter doesn't apply. All your categories are BP, so you don't need to tell it to run only those categories.
Thank you. Finally I had to modify all the document with BASH and R forcing it to look like "GENEID" "ONTOLOGY" "GOID" "TERM" PITA_XXXXX CC GO:XXXXX cellular_component Individual GO terms in each row for all the genes. By the way, annotation1000$Query_sequence was my mistake.