Question: Ranking bacterial genomes using present and absent genes information
gravatar for Elendol
8 months ago by
Ireland, Dublin, UCD
Elendol0 wrote:


I started to work on this project a bit outside my usual area of expertise so I am not really aware of all the tools and algorithms (yet), so I thought I might find people with more experience here.

How would you rank a list of genomes according to the presence or absence of certain genes of interests? The goal would be to curate a set of genomes (thousands) based on the presence and absence of some traits (e.g. presence of some bacteriocin and absence of antibiotic resistance gene)

I don't want to filter genomes but rank them. It could also work with proteomes and on PFAM domains instead of genes

Is there a specific algorithm or software to do this job? I was thinking about counting and weighting COGs in a genome annotation file, or counting HMMs on a proteome. I did some light google searching and couldn't really find something that suited me.

bacteria genome • 163 views
ADD COMMENTlink written 8 months ago by Elendol0

The first thing that springs to mind would be to rank the output of a pangenome/core genome tool like roary.

It will ultimately spit out a list of gene clusters, approximately of the form:

 LocusTagX, LocusTagY, LocusTagZ...

And it will do this for every gene cluster. Broadly speaking, the locus tags that appear most frequently would be highest up your ranking for presence (since each locus tag should correspond to a particular input genome).

It actually also outputs binary presence-absence alignments/trees which could be of use too.

ADD REPLYlink written 8 months ago by Joe18k

Hi Joe,

Yes that's what I do for a part on my project but working on a limited number of genomes.

The downside of this method is it's a bit "too smart" I need to load all the genomes, perform the pangenome analysis and from there analysis the locus. I was doing something much simpler that would focus only on the my genes of interests/disinterest, use some weighting systems, find them in a genome, give a score to the genome, then sort the genome by score. Eventually I would be able to add more genomes, score them, etc.

ADD REPLYlink written 8 months ago by Elendol0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1767 users visited in the last hour