De novo Genome Annotation without extra data
1
0
Entering edit mode
5.4 years ago
margab ▴ 10

I would like some suggestions for which tools or pipelines I can use for de novo genome annotation without using transcriptional data (ESTs, RNA-seq, Transcripts, Isoseq), proteins for my organism or hints from proteins of unknown evolutionary distance or other extra data.

Genome information: Mammalian genome, size of 2.5Gb Illumina (36-fold coverage) and Nanopore(4-fold coverage) data was used for the assembly

genome annotation • 817 views
ADD COMMENT
0
Entering edit mode

Whilst you might not be able to leverage transcriptional data, perhaps your mammal has some "slightly distant relatives" that you can leverage their predicted proteins in annotation using MAKER. You could train Augustus with BUSCO and include proteins from the "distant relatives". I've done this before for a rodent and turtle genome.

ADD REPLY
0
Entering edit mode
5.4 years ago

Given that you can not (want not?) use extrinsic info you have to rely on intrinsic or ab-initio prediction tools eg. Augustus, EuGene, Genemark, ... and many others. The big issue here is that, in order to get somewhat good results you will have to train/optimise them for your organism, which is not a simple task (but doable though!).

I'm however wondering why you say you want to do this without extrinsic data? As jean.elbers pointed out as well, there is nonetheless valuable info in all the proteins known so far, even if they are not specifically from the you are working with. Transcript data might indeed be a little less straightforward but the protein info is gonna be for sure valuable!

Actually, you will achieve best performance of your genome annotation tools when combining both intrinsic and extrinsic info in your approach. So perhaps reconsider using any available data source.

ADD COMMENT

Login before adding your answer.

Traffic: 1843 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6