Question

Genome annotation using COG

0

Entering edit mode

5.3 years ago

Paul ▴ 80

I have some new organisms that were assembled and scaffolded using SPADES. Now, I have around 1000 of scaffolds for each organism genome. I want to functionally annotate the scaffolds against COG.

I tried using webMGA. However, it requires a protein sequence as an input and I have nucleotide sequences as scaffolds for each genome. How do I functionally annotate the genome using the scaffolds?

sequencing annotation COG • 3.5k views

ADD COMMENT • link 5.3 years ago by Paul ▴ 80

1

Entering edit mode

Could you explain what kind of organism it is?

I mean is it bacterial genome your trying to assemble or some eukaryotic organism your working on?

If it is a bacterial genome and your getting 1000 scaffolds/contigs then you really have to look into this by performing assembly validation. You can do assembly validation using the number of criteria like the total number of bases in assembly i.e. genome size, N50 value, Number of Contigs/Scaffold, Total number of reads supporting for the assembly, Minimum contig/scaffold length (You can put minimum scaffold/contig length criteria to prune the number of contigs. Ideally it should be 200bp, and if your genome is covering by keeping it 1000bp then it would be great and so on), %GC etc.

If you want to annotate your draft assembly against COG then you need protein sequences. In this case you can perform gene prediction on assembled contigs/scaffolds using gene prediction tools like, prokka, prodigal, genemark, glimmer, maker, augustus and many more. These all software will generate amino acid (i.e. protein) sequences in a file (generally having extension .faa), which will be your potential protein coding genes.

You can use this file (containing protein sequences) to annotate your assembly. In your case you can use EggNOG web-server to annotate your predicted proteins against COG database.

Hope it will help to resolve your issue.

ADD REPLY • link 5.3 years ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

Thankyou so much for the detailed annotation @Nitin, I did a Quast quality analysis after the assembly, which showed me the following details, please let me know if I can go ahead with the assembly

Statistics without reference
# contigs 5801
N50 265554
N75 12966
L50 88
L75 198
GC (%) 98

ADD REPLY • link 5.3 years ago by Paul ▴ 80

1

Entering edit mode

What is the organism your working on and what would be the approximate genome size?

Let's consider average genome size for your organism is 7.5-8Mb. Then your assembly is good. If it is 5Mb then there is likely to have contamination for sure.

Thank you.

ADD REPLY • link 5.3 years ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

Thankyou @ Nitin, the genome size is around 8Mb

ADD REPLY • link 5.3 years ago by Paul ▴ 80