Can you provide some stats about how good that assembly is now (# of contigs, avg length, N50), ? It is unlikely to be complete so you should always keep your expectations in line with that point.
It all depends a little on how serious (high quality) you want the result to be. If you only want to have a global idea of what it would look like , any of the pipelines you mention will do I assume.
The protocol as described for the sunflower paper will deliver nice result but is much more work (compute / time / man / ...) to run then the pipeline-packages. Being a big fan of Eugene I can certainly recommend that one but keep in mind it will require some tweaking and time-investement to obtain the best result.
Generally, keep in mind that the bigger your genome is and the more data you might want to input to the pipelines the more computational power and time you will need.
Parameters may be dependent on your genome since "one size fits all" will not apply.
Be prepared to spend much longer on doing annotation than you did on the sequencing/assembly. While parts of the annotation can be automated there would be human intervention required in many places and this (allotetraploid) is going to make your task that much harder.
If you intend to make the genome public then you can leverage NCBI's Eukaryotic annotation pipeline.
Edit: Request annotation link on NCBI's Eukaryotic annotation page is not working. I have emailed their support.
Edit 2: NCBI support agreed that the wording/link is misleading. Clicking on the "Request Annotation" takes you to a help desk ticket page. You are supposed to fill out a request for annotation. They will then get in touch with you.
yes, usually EuGene is our 'end' tool by which we combine all other data, but much more other recipes are possible.
You will definitely need to do parameter optimization, which can take up quite some time even but the end result will reflect the effort you put in it!
I never worked with plant genome, but I'v heard maker have a particular pipeline adapted for plants genome, maybe you can find what you need in here : http://www.yandell-lab.org/software/maker-p.html
correct, it is part of the Maker "family" .... however after all these years I still have to figure out what makes this particular OK for plant genomes though, especially compared to the 'normal Maker .
Problem with all those pipelines is that there is no "one size fits all" as mentioned by genomax , while that's exactly what they try to offer .
Anyway good annotation can be made by several software, bad annotation can be made by all software if you're not paying attention to the details ;-)
We have developed a gene annotator called FINDER which can annotate eukaryotic genomes using short-read RNA-Seq reads and protein sequences. It is completely automated and requires no manual intervention. FINDER also runs BRAKER to incorporate predicted genes in the repertoire. You can access the paper from FINDER and the software from here GitHub.
In the hope of keeping this thread alive, I suggest the most accessible eukaryotic genome annotation pipeline/framework: MOSGA.
https://mosga.mathematik.uni-marburg.de
It is accessible through a web-interface did not require any installation, although it is possible to install it as well as run it in a docker container.
Gene predictions can be performed via ab initio, RNA-Seq or proteins, or orthology-based evidence.
Can you provide some stats about how good that assembly is now (# of contigs, avg length, N50), ? It is unlikely to be complete so you should always keep your expectations in line with that point.
The genome is an allotetraploid with 3gb in size and 5000 contigs. The N50 is 1.3 Mb.
looks decent at first sight, but given that genome size be prepared to spend time on it as mentioned by genomax
hello i want to AB1 files to FASTQ
Please don't ask unrelated questions in pre-existing threads. Please open a new thread if you want to ask a well-formulated question.