Is it better to annotate contigs or scaffolds
2
2
Entering edit mode
7.4 years ago
mgalactus ▴ 760

Hi,

I'm annotating some bacterial genomes, and I was wondering whether it makes more sense to annotate the contigs and then scaffold them or if it would have been ok to annotate the scaffolds. I'm planning to submit these genomes to NCBI, so it should comply with their standards as well.

Thanks

bacteria contigs scaffolds annotation • 4.9k views
3
Entering edit mode
7.4 years ago
dago ★ 2.7k

I would say that annotating scaffolds makes much more sense. One scaffold can be done by many contings, and it could be that at the end of one conting you find a CDS broken in the middle or maybe a gene cluster broken in the middle. This can produce incorrect annotation or can give you a partial information on the gene order in the genome. Instead, likely, in the scaffolds this bias should be reduced.

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  -> scaffold
xxxxxx                     xxxxxxxxxxxxx
xxxxxxxxxxxx                          xxxxxxxxxxxxx -> contings

1
Entering edit mode

Thanks for the reply: you are probably right, but do you have any information regarding if NCBI wants the scaffolding information to be given somehow?

2
Entering edit mode

Take a look here

1
Entering edit mode
7.4 years ago
HG ★ 1.2k

Please find an email response long back I got from NCBI

We do accept gapped submissions if N's represent gaps between ordered and oriented contiguous sequences. If you are using estimated gap sizes, then the number of N's should exactly match the estimated gap size. If you are unsure of the gap size, you should add 100 N's in the sequence file.

Please note we offer two submission pathways (Complete and WGS):

1. The genome assembly could be submitted as a complete genome if it falls into either of these cases:

a. You have sequenced the complete circular genome and there are no gaps
b. You know the order and orientation of the contigs and were able to assemble your sequences, with Ns between the contigs, into a single scaffold representing the circular genome with no extra unplaced contigs Genomes in the complete category should be submitted as .sqn files with or without annotation using GenomesMacroSend (http://www.ncbi.nlm.nih.gov/projects/GenomeSubmit/genome_submit.cgi) as described in http://www.ncbi.nlm.nih.gov/Genbank/genomesubmit.html.

2. If the genome assembly is in multiple pieces that you were unable to assemble into a complete chromosome, then submit the contigs to our Whole Genome Shotgun (WGS) database using the WGS submission portal (https://submit.ncbi.nlm.nih.gov/subs/wgs/). See the WGS page, http://www.ncbi.nlm.nih.gov/Genbank/wgs.submit.html for details.