I've been working on some core genome stuff lately, via OrthoMCL, and have begun to separate out genes based on their 'core-ness' (i.e. are they paralogous, soft-core, full-core and so on).
I'd like to profile the genes remaining in these datasets. For instance, how many of the 2400+ genes that are reported as core are involved in e.g. metabolism, virulence, DNA replication and so forth.
What sorts of programs/tools are available - and ideally reasonably up to date - that could give me something like this?
Some that I'm familiar with, like DAVID, GO, KEGG etc are seemingly getting a little out of date, and can be very picky about what database identifiers you need etc. A colleague mentioned Scoary as an option, but as I didn't do the analysis with Roary, I would have to manually build the presence/absence CSV file it uses as an input, and still, that doesn't give me ontology/pathway analysis as far as I'm aware.
OrthoDB is constantly updated, and provides a number of tools, precomputed files and an API to access the data.