I started to work on this project a bit outside my usual area of expertise so I am not really aware of all the tools and algorithms (yet), so I thought I might find people with more experience here.
How would you rank a list of genomes according to the presence or absence of certain genes of interests? The goal would be to curate a set of genomes (thousands) based on the presence and absence of some traits (e.g. presence of some bacteriocin and absence of antibiotic resistance gene)
I don't want to filter genomes but rank them. It could also work with proteomes and on PFAM domains instead of genes
Is there a specific algorithm or software to do this job? I was thinking about counting and weighting COGs in a genome annotation file, or counting HMMs on a proteome. I did some light google searching and couldn't really find something that suited me.