Between any pairs of gene lists, you can do a Fisher Exact test (where you compare the proportion of GO category genes genes in one sample/species to another sample/species instead of to a background frequency).
I think it would be hard to interpret a single statistical tests that compared the results for 10 species simultaneously. I would probably recommend either binning the categories into significant or non-significant (so, category A was significantly enriched with FDR < 0.05 in X / 10 species) or prioritize based upon the relative p-values (so, species X showed the lowest p-value for enrichment for category A). Pairwse FE test listed above could also be acceptable.
Understanding the overall goals may also be important. For example, you mention you have a list of pseudogenes, but why are these pseudogenes coming from >10 different species? GO categories will either be defined independently for each species or are defined by homologs to a single, commonly studied species. For example, the human genome has GO categories but I doubt the chimp genome has an independently curated list of GO categories. If that were the case,I don't think it would make sense to compare the relative contributions of pseudogenes that happen to have the closest homology to a known human gene versus a known chimp gene (if the assumption was correct that all chimp GO definitions were really coming from human homologs). In this case, you would call say all genes belong to a single species (whatever primate is being studied), and the GO categories are defined with respect to human homologs (or just the human genes, if you are working with human pseudogenes). Since I don't know the background for this particular comparison, I can't really provide better advice on this particular aspect.