Question

Annotation pipelines in 2018

1

Entering edit mode

6.2 years ago

Ric ▴ 440

Hi, I would like to annotate a plant genome. I have also some RNA-Seq data.

Here ( https://www.sunflowergenome.org/annotations/ ) there described that they used the below technologies in order to annotate the sunflower genome:

https://urgi.versailles.inra.fr/Tools/REPET
Gene predictions from EuGene
Gene predictions from AUGUSTUS
Gene predictions from SNAP
Transcript assemblies from StringTie
Maker

I also found other pipelines:

and there is, even more, here ( https://omictools.com/genome-annotation-category )

Which one to choose?

Thank you in advance.

assembly RNA-Seq genome gene • 5.8k views

ADD COMMENT • link updated 2.8 years ago by BioinformaticBird ▴ 110 • written 6.2 years ago by Ric ▴ 440

0

Entering edit mode

I would like to annotate a plant genome.

Can you provide some stats about how good that assembly is now (# of contigs, avg length, N50), ? It is unlikely to be complete so you should always keep your expectations in line with that point.

ADD REPLY • link 6.2 years ago by GenoMax 146k

0

Entering edit mode

The genome is an allotetraploid with 3gb in size and 5000 contigs. The N50 is 1.3 Mb.

ADD REPLY • link 6.2 years ago by Ric ▴ 440

0

Entering edit mode

looks decent at first sight, but given that genome size be prepared to spend time on it as mentioned by genomax

ADD REPLY • link 6.2 years ago by lieven.sterck 15k

0

Entering edit mode

hello i want to AB1 files to FASTQ

ADD REPLY • link 4.6 years ago by sekoubagayoko988 • 0

1

Entering edit mode

Please don't ask unrelated questions in pre-existing threads. Please open a new thread if you want to ask a well-formulated question.

ADD REPLY • link 4.6 years ago by GenoMax 146k

score 2 · Answer 1 · 2020-03-14

2

Entering edit mode

4.6 years ago

Juke34 8.8k

I did my own list of annotation tools here https://github.com/NBISweden/GAAS/blob/master/annotation/knowledge/annotation_tools_genome.md

You can look at the pipeline category.

ADD COMMENT • link 4.2 years ago by Juke34 8.8k

score 1 · Answer 2 · 2018-08-10

1

Entering edit mode

6.2 years ago

lieven.sterck 15k

It all depends a little on how serious (high quality) you want the result to be. If you only want to have a global idea of what it would look like , any of the pipelines you mention will do I assume.

The protocol as described for the sunflower paper will deliver nice result but is much more work (compute / time / man / ...) to run then the pipeline-packages. Being a big fan of Eugene I can certainly recommend that one but keep in mind it will require some tweaking and time-investement to obtain the best result.

Generally, keep in mind that the bigger your genome is and the more data you might want to input to the pipelines the more computational power and time you will need.

ADD COMMENT • link 6.2 years ago by lieven.sterck 15k

0

Entering edit mode

The expectation is to get a high-quality annotation. Do you only run Eugene and how do you know which parameter has to be tweaked?

ADD REPLY • link 6.2 years ago by Ric ▴ 440

0

Entering edit mode

Parameters may be dependent on your genome since "one size fits all" will not apply.

Be prepared to spend much longer on doing annotation than you did on the sequencing/assembly. While parts of the annotation can be automated there would be human intervention required in many places and this (allotetraploid) is going to make your task that much harder.

If you intend to make the genome public then you can leverage NCBI's Eukaryotic annotation pipeline.
Edit: Request annotation link on NCBI's Eukaryotic annotation page is not working. I have emailed their support.

Edit 2: NCBI support agreed that the wording/link is misleading. Clicking on the "Request Annotation" takes you to a help desk ticket page. You are supposed to fill out a request for annotation. They will then get in touch with you.

ADD REPLY • link 6.2 years ago by GenoMax 146k

0

Entering edit mode

Does this mean if you intend to make the genome public you can request NCBI to do the annotation?

ADD REPLY • link 5.8 years ago by olechnwin ▴ 60

1

Entering edit mode

I believe that is the case. You can use the information above to email them and ask.

ADD REPLY • link 5.8 years ago by GenoMax 146k

0

Entering edit mode

yes, usually EuGene is our 'end' tool by which we combine all other data, but much more other recipes are possible.

You will definitely need to do parameter optimization, which can take up quite some time even but the end result will reflect the effort you put in it!

ADD REPLY • link 6.2 years ago by lieven.sterck 15k

score 0 · Answer 3 · 2018-08-10

0

Entering edit mode

6.2 years ago

Rox ★ 1.4k

Hello !

I never worked with plant genome, but I'v heard maker have a particular pipeline adapted for plants genome, maybe you can find what you need in here : http://www.yandell-lab.org/software/maker-p.html

Cheers,

Roxane

ADD COMMENT • link 6.2 years ago by Rox ★ 1.4k

0

Entering edit mode

correct, it is part of the Maker "family" .... however after all these years I still have to figure out what makes this particular OK for plant genomes though, especially compared to the 'normal Maker .

Problem with all those pipelines is that there is no "one size fits all" as mentioned by genomax , while that's exactly what they try to offer .

Anyway good annotation can be made by several software, bad annotation can be made by all software if you're not paying attention to the details ;-)

ADD REPLY • link 6.2 years ago by lieven.sterck 15k

score 0 · Answer 4 · 2021-05-27

Hello,

We have developed a gene annotator called FINDER which can annotate eukaryotic genomes using short-read RNA-Seq reads and protein sequences. It is completely automated and requires no manual intervention. FINDER also runs BRAKER to incorporate predicted genes in the repertoire. You can access the paper from FINDER and the software from here GitHub.

Thank you.

score 0 · Answer 5 · 2022-01-27

In the hope of keeping this thread alive, I suggest the most accessible eukaryotic genome annotation pipeline/framework: MOSGA. https://mosga.mathematik.uni-marburg.de

It is accessible through a web-interface did not require any installation, although it is possible to install it as well as run it in a docker container.

Gene predictions can be performed via ab initio, RNA-Seq or proteins, or orthology-based evidence.