Question

How To Annotate A Newly Sequenced Genome

5

Entering edit mode

10.8 years ago

jcastrofigueroa ▴ 140

Hello World! I need your help.

I have many contig sequences of a new microorganism which I'd like to characterize, for example: identify putative genes and assign putative function to them, also record the domains present with their respective e-values, orientation of the strand, etc. and finally save that information (for each contig) as a *.embl file.

My first thought is to use a software to identify putative genes such as GeneMark, but then I don't knot what to do.

Can anybody try give me some guidelines or a pipeline on how could I do that, using BioPython?

Thanks

biopython visualization • 11k views

ADD COMMENT • link updated 9.9 years ago by Biostar 20 • written 10.8 years ago by jcastrofigueroa ▴ 140

2

Entering edit mode

MAKER is a popular pipeline for gene predictions. http://gmod.org/wiki/MAKER

ADD REPLY • link 10.8 years ago by Damian Kao 16k

1

Entering edit mode

I agree, MAKER is possibly the way to go. Try reading the tutorial http://gmod.org/wiki/MAKER_Tutorial_2012 to understand the basics. Also, I think that language preference should be regarded as secondary.

ADD REPLY • link 10.8 years ago by Michael 54k

1

Entering edit mode

Also RAST can be useful.

ADD REPLY • link 10.7 years ago by jcastrofigueroa ▴ 140

0

Entering edit mode

We're using MAKER right now on an AMAZON EC2 instance. I can't say the installation is especially user friendly, but we ended up choosing MAKER because it seemed the best option.

ADD REPLY • link 10.8 years ago by Eric Normandeau 11k

score 7 · Answer 1 · 2013-06-20

7

Entering edit mode

10.8 years ago

cts ★ 1.7k

For bacteria/archaea we use prokka, from the Victorian Bioinformatics Consortium:

Prokka is a software tool to annotate bacterial, archaeal and viral genomes very rapidly, and produce output files that require only minor tweaking to submit to Genbank/ENA/DDBJ

And when their say minor tweaking it is really minor. Prokka gives you a genbank and a sequin file for rapid upload to NCBI as well as other files that are useful in different circumstances (like a gff file). Like any pipeline it has a few dependancies but prokka itself is very easy to install.

In comparison to MAKER, prokka does not handle multi-exon gene models (no introns) so it is only useful for bacteria/archaea but it does protein, tRNA and rRNA annotations. I'm not sure whether MAKER will also annotate the RNAs (it doesn't say so on their website but I may have missed it). Prokka also uses both blast and HMMER for functional annotations using custom subsets of Uniref, CDD, Pfam, Tigrfam. Again the MAKER website only mentions using blast. (Perhaps others can correct me if I'm wrong, I've never used MAKER)

ADD COMMENT • link 10.8 years ago by cts ★ 1.7k

0

Entering edit mode

I would agree with the last part of this and suggest that you shouldn't use BLAST alone if you want to have accurate functional annotation of the coding sequences that are predicted.

ADD REPLY • link 10.8 years ago by sarahhunter ▴ 600

0

Entering edit mode

MAKER uses blast, exonerate, and snap. I don't know how it compared to Prokka.

ADD REPLY • link 10.8 years ago by Eric Normandeau 11k

0

Entering edit mode

Prokka has trouble working with Spades. I'm getting "contig ID too long". Various blog posts suggests using some additional flags (e.g. --compliant, --centre) but so far I was not able to solve the issue. I know some have regressed to older versions of Prokka as the bug was introduced relatively recently. But so far I cannot really recommend Prokka. If it stumbles at the first obstacle I wonder how accurate is at the more complex stuff.

ADD REPLY • link 7.9 years ago by Nick ▴ 290

0

Entering edit mode

If you want, I wrote a script to rename contigs assemblied by Spades in order to perform prokka annotation

ADD REPLY • link 7.8 years ago by MathGon ▴ 10