Question: How To Annotate A Newly Sequenced Genome
gravatar for jcastrofigueroa
3.9 years ago by
Norwich, UK
jcastrofigueroa140 wrote:

Hello World! I need your help.

I have many contig sequences of a new microorganism which I'd like to characterize, for example: identify putative genes and assign putative function to them, also record the domains present with their respective e-values, orientation of the strand, etc. and finally save that information (for each contig) as a *.embl file.

My first thought is to use a software to identify putative genes such as GeneMark, but then I don't knot what to do.

Can anybody try give me some guidelines or a pipeline on how could I do that, using BioPython?


visualization biopython • 4.3k views
ADD COMMENTlink modified 3.0 years ago by Biostar ♦♦ 20 • written 3.9 years ago by jcastrofigueroa140

MAKER is a popular pipeline for gene predictions.

ADD REPLYlink written 3.9 years ago by Damian Kao14k

I agree, MAKER is possibly the way to go. Try reading the tutorial to understand the basics. Also, I think that language preference should be regarded as secondary.

ADD REPLYlink written 3.9 years ago by Michael Dondrup41k

Also RAST can be useful.

ADD REPLYlink written 3.8 years ago by jcastrofigueroa140

We're using MAKER right now on an AMAZON EC2 instance. I can't say the installation is especially user friendly, but we ended up choosing MAKER because it seemed the best option.

ADD REPLYlink written 3.9 years ago by Eric Normandeau9.4k
gravatar for cts
3.9 years ago by
cts1.5k wrote:

For bacteria/archaea we use prokka, from the Victorian Bioinformatics Consortium:

Prokka is a software tool to annotate bacterial, archaeal and viral genomes very rapidly, and produce output files that require only minor tweaking to submit to Genbank/ENA/DDBJ

And when their say minor tweaking it is really minor. Prokka gives you a genbank and a sequin file for rapid upload to NCBI as well as other files that are useful in different circumstances (like a gff file). Like any pipeline it has a few dependancies but prokka itself is very easy to install.

In comparison to MAKER, prokka does not handle multi-exon gene models (no introns) so it is only useful for bacteria/archaea but it does protein, tRNA and rRNA annotations. I'm not sure whether MAKER will also annotate the RNAs (it doesn't say so on their website but I may have missed it). Prokka also uses both blast and HMMER for functional annotations using custom subsets of Uniref, CDD, Pfam, Tigrfam. Again the MAKER website only mentions using blast. (Perhaps others can correct me if I'm wrong, I've never used MAKER)

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by cts1.5k

I would agree with the last part of this and suggest that you shouldn't use BLAST alone if you want to have accurate functional annotation of the coding sequences that are predicted.

ADD REPLYlink written 3.9 years ago by sarahhunter590

MAKER uses blast, exonerate, and snap. I don't know how it compared to Prokka.

ADD REPLYlink written 3.9 years ago by Eric Normandeau9.4k

Prokka has trouble working with Spades. I'm getting "contig ID too long". Various blog posts suggests using some additional flags (e.g. --compliant, --centre) but so far I was not able to solve the issue. I know some have regressed to older versions of Prokka as the bug was introduced relatively recently. But so far I cannot really recommend Prokka. If it stumbles at the first obstacle I wonder how accurate is at the more complex stuff.

ADD REPLYlink written 11 months ago by Nick210

If you want, I wrote a script to rename contigs assemblied by Spades in order to perform prokka annotation

ADD REPLYlink written 11 months ago by MathGon10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 862 users visited in the last hour