Ranking bacterial genomes using present and absent genes information
0
0
Entering edit mode
4.2 years ago
Elendol • 0

Hi,

I started to work on this project a bit outside my usual area of expertise so I am not really aware of all the tools and algorithms (yet), so I thought I might find people with more experience here.

How would you rank a list of genomes according to the presence or absence of certain genes of interests? The goal would be to curate a set of genomes (thousands) based on the presence and absence of some traits (e.g. presence of some bacteriocin and absence of antibiotic resistance gene)

I don't want to filter genomes but rank them. It could also work with proteomes and on PFAM domains instead of genes

Is there a specific algorithm or software to do this job? I was thinking about counting and weighting COGs in a genome annotation file, or counting HMMs on a proteome. I did some light google searching and couldn't really find something that suited me.

genome bacteria • 605 views
ADD COMMENT
0
Entering edit mode

The first thing that springs to mind would be to rank the output of a pangenome/core genome tool like roary.

It will ultimately spit out a list of gene clusters, approximately of the form:

 LocusTagX, LocusTagY, LocusTagZ...

And it will do this for every gene cluster. Broadly speaking, the locus tags that appear most frequently would be highest up your ranking for presence (since each locus tag should correspond to a particular input genome).

It actually also outputs binary presence-absence alignments/trees which could be of use too.

ADD REPLY
0
Entering edit mode

Hi Joe,

Yes that's what I do for a part on my project but working on a limited number of genomes.

The downside of this method is it's a bit "too smart" I need to load all the genomes, perform the pangenome analysis and from there analysis the locus. I was doing something much simpler that would focus only on the my genes of interests/disinterest, use some weighting systems, find them in a genome, give a score to the genome, then sort the genome by score. Eventually I would be able to add more genomes, score them, etc.

ADD REPLY

Login before adding your answer.

Traffic: 1581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6