Automated Microbial Gene Prediction From Assembled Genomes -- What Is The Latest, Most Accurate Software?
4
3
Entering edit mode
11.0 years ago
JacobS ▴ 980

I would like to get a feel for what sort of strategies other bioinformaticians use for conducting genome-wide gene prediction for prokaroytes. There doesn't seem to be much research into improving automated gene prediction for single organisms, with the latest articles concerning communities and metagenomics applications.

I am especially curious because these three methods all returned results that differed by about 10% when tested on the same genome. As I am new to gene prediction, I am interested to hear about other helpful software and in-house analysis techniques. How do these approaches differ when a reference annotated genome exists for training?

Here is the community-sourced list of software packages:

And web/literature sources:

Bacterial genome annotation systems

Please continue to add suggestions to this list, and I will update regularly as well!

assembly training • 6.4k views
ADD COMMENT
2
Entering edit mode
11.0 years ago
rtliu ★ 2.2k

GenePRIMP from JGI was designed to solve your problem.

GenePRIMP stands for "Gene PRediction IMprovement Pipeline". The GenePRIMP pipeline consists of a series of computational units that identify erroneous gene calls and missed genes, and then correct a subset of the identified anomalous features. The data input to GenePRIMP needs to be a file of gene calls in GenBank or EMBL format. As its output, GenePRIMP generates reports of identified anomalies, plus a corrected EMBL file.

This blog (Bacterial genome annotation systems) also provides useful information.

ADD COMMENT
2
Entering edit mode
11.0 years ago

In addition to whatever software you use to predict genes in the bacterial genome, make sure you run your sequence (translated in six frames with X as stop codons) against protein domain database such as Pfam. It helps to identify regions missed by gene calling software but also shows indels/false stop codons present due to sequencing errors.

ADD COMMENT
1
Entering edit mode
11.0 years ago
Josh Herr 5.8k

Here's a good list of all the prediction programs out there. The difference you see in the programs is due to differences in motif calling. I've really only used AUGUSTUS (which works well IMO) but I'm fairly certain it uses Eukaryote motifs so I don't know how well it would work for bacteria and archaea (probably not at all).

ADD COMMENT
1
Entering edit mode

@Josh, thanks for the list -- I've defintely seen this one before and have worked my way through it. The problem is, much of the software listed there is over 10 years old, and several newly made scripts, such as Prodigal, are missing too. I'm more looking for people who do this often to report on their strategies

ADD REPLY
0
Entering edit mode

+1, I knew I wasn't going to be much help, but I hope some other people report their experiences with gene prediction so we both can learn a bit. So far, AUGUSTUS has been good for me, but I would like to learn about more options including what new resources people are using.

ADD REPLY
1
Entering edit mode
11.0 years ago

Worth trying: ISGA, DIYA, BASys, maker; more eukaryotic though!] & ARGOT are equally useful- if I am not mistaken! Not sure of their "comparative performances"!

ADD COMMENT

Login before adding your answer.

Traffic: 2628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6