Question: Genome annotation using COG
0
gravatar for Paul
6 months ago by
Paul80
India
Paul80 wrote:

I have some new organisms that were assembled and scaffolded using SPADES. Now, I have around 1000 of scaffolds for each organism genome. I want to functionally annotate the scaffolds against COG.

I tried using webMGA. However, it requires a protein sequence as an input and I have nucleotide sequences as scaffolds for each genome. How do I functionally annotate the genome using the scaffolds?

sequencing cog annotation • 329 views
ADD COMMENTlink modified 6 months ago • written 6 months ago by Paul80
1

Could you explain what kind of organism it is?

I mean is it bacterial genome your trying to assemble or some eukaryotic organism your working on?

If it is a bacterial genome and your getting 1000 scaffolds/contigs then you really have to look into this by performing assembly validation. You can do assembly validation using the number of criteria like the total number of bases in assembly i.e. genome size, N50 value, Number of Contigs/Scaffold, Total number of reads supporting for the assembly, Minimum contig/scaffold length (You can put minimum scaffold/contig length criteria to prune the number of contigs. Ideally it should be 200bp, and if your genome is covering by keeping it 1000bp then it would be great and so on), %GC etc.

If you want to annotate your draft assembly against COG then you need protein sequences. In this case you can perform gene prediction on assembled contigs/scaffolds using gene prediction tools like, prokka, prodigal, genemark, glimmer, maker, augustus and many more. These all software will generate amino acid (i.e. protein) sequences in a file (generally having extension .faa), which will be your potential protein coding genes.

You can use this file (containing protein sequences) to annotate your assembly. In your case you can use EggNOG web-server to annotate your predicted proteins against COG database.

Hope it will help to resolve your issue.

ADD REPLYlink modified 6 months ago • written 6 months ago by Nitin Narwade420

Thankyou so much for the detailed annotation @Nitin, I did a Quast quality analysis after the assembly, which showed me the following details, please let me know if I can go ahead with the assembly

Statistics without reference
# contigs 5801
N50 265554
N75 12966
L50 88
L75 198
GC (%) 98
ADD REPLYlink modified 6 months ago • written 6 months ago by Paul80
1

What is the organism your working on and what would be the approximate genome size?

Let's consider average genome size for your organism is 7.5-8Mb. Then your assembly is good. If it is 5Mb then there is likely to have contamination for sure.

Thank you.

ADD REPLYlink written 6 months ago by Nitin Narwade420

Thankyou @ Nitin, the genome size is around 8Mb

ADD REPLYlink written 6 months ago by Paul80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1666 users visited in the last hour