Question: Genome annotation using COG
gravatar for Paul
14 months ago by
Paul80 wrote:

I have some new organisms that were assembled and scaffolded using SPADES. Now, I have around 1000 of scaffolds for each organism genome. I want to functionally annotate the scaffolds against COG.

I tried using webMGA. However, it requires a protein sequence as an input and I have nucleotide sequences as scaffolds for each genome. How do I functionally annotate the genome using the scaffolds?

sequencing cog annotation • 779 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by Paul80

Could you explain what kind of organism it is?

I mean is it bacterial genome your trying to assemble or some eukaryotic organism your working on?

If it is a bacterial genome and your getting 1000 scaffolds/contigs then you really have to look into this by performing assembly validation. You can do assembly validation using the number of criteria like the total number of bases in assembly i.e. genome size, N50 value, Number of Contigs/Scaffold, Total number of reads supporting for the assembly, Minimum contig/scaffold length (You can put minimum scaffold/contig length criteria to prune the number of contigs. Ideally it should be 200bp, and if your genome is covering by keeping it 1000bp then it would be great and so on), %GC etc.

If you want to annotate your draft assembly against COG then you need protein sequences. In this case you can perform gene prediction on assembled contigs/scaffolds using gene prediction tools like, prokka, prodigal, genemark, glimmer, maker, augustus and many more. These all software will generate amino acid (i.e. protein) sequences in a file (generally having extension .faa), which will be your potential protein coding genes.

You can use this file (containing protein sequences) to annotate your assembly. In your case you can use EggNOG web-server to annotate your predicted proteins against COG database.

Hope it will help to resolve your issue.

ADD REPLYlink modified 14 months ago • written 14 months ago by Nitin Narwade440

Thankyou so much for the detailed annotation @Nitin, I did a Quast quality analysis after the assembly, which showed me the following details, please let me know if I can go ahead with the assembly

Statistics without reference
# contigs 5801
N50 265554
N75 12966
L50 88
L75 198
GC (%) 98
ADD REPLYlink modified 14 months ago • written 14 months ago by Paul80

What is the organism your working on and what would be the approximate genome size?

Let's consider average genome size for your organism is 7.5-8Mb. Then your assembly is good. If it is 5Mb then there is likely to have contamination for sure.

Thank you.

ADD REPLYlink written 14 months ago by Nitin Narwade440

Thankyou @ Nitin, the genome size is around 8Mb

ADD REPLYlink written 14 months ago by Paul80
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1179 users visited in the last hour