DeNovo Assembly - Assigning gene description
3
0
Entering edit mode
3 months ago

Hi,

I generated a plant genome de novo assembly with a decent N50 value and Busco completion score using the Masurca tool. Then, I performed structural annotation, such as gene prediction, using Braker2.

After the Braker2 run, I got the coding sequences and the .gff file. Now, I want to assign gene descriptions for these CDs/genes in the .gff file.

Which tool or methodology can I use for that? I tried to do a blast alignment of the coding sequences against the NR databases have been running endlessly and seem very time-consuming.

Is there any tool available similar to Trinotate for transcriptomes? Any suggestions or ideas regarding this is highly appreciated. Hope someone will help me out with this.

Thank you

braker2 DeNovo-Assembly masurca WGS • 431 views
ADD COMMENT
1
Entering edit mode
3 months ago
Michael 55k

I'd try Funannotate which is the most convenient annotation pipeline I know. While many public genome annotation are made with the Gnomon or Ensembl pipelines, it is hardly possible to install and run them locally without the support of the developer teams (possibly not even then). You would have to test if it likes your external Braker2 gffs but it shouldn't be a show-stopper.

See the documentation to funannotate annotate. Caveat: you still need to run InterProScan manually which is the heart of functional annotation.

If you need an alternative and want to submit your assembly to ENA, you can take a look at the snakemake annotation pipeline we are using in the EarthBio Genome project in Norway.

ADD COMMENT
1
Entering edit mode
3 months ago
michael.ante ★ 3.9k

Running blast against NR seems like an overkill. You can use your CDS to blast against Uniprot (locally, i.e. download all plant-related entries and build a blast-db ) using blastx.

Alternatively, you can use HMMER3 on a translated protein fasta, which you can create e.g. via Emboss' transeq.

ADD COMMENT
1
Entering edit mode
3 months ago
shelkmike ★ 1.5k

I prefer PANNZER2 (http://ekhidna2.biocenter.helsinki.fi/sanspanz/). Just upload a FASTA file with proteins, and PANNZER2 will give you a list of their descriptions.

ADD COMMENT

Login before adding your answer.

Traffic: 4731 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6