How to submit bacterial genomes to the database?
3
0
Entering edit mode
6.8 years ago
olp123 ▴ 20

Dear all,

which database is commonly used to submit bacterial genomes? I have a genome in the form of one fasta file consisting of ~150 seqences (Illumina MiSeq). Some of these sequences are less than 200 nucleotides long. These are mostly homopolymeric DNA stretches. When trying to submit to the ncbi database, I cannot complete the process because sequences <200 nucleotides are not allowed. If I just delete the short sequences, don't I distort the data? As you can easily recognize, this is my first time to submit a genome. In the ncbi manual, it is not stated how to deal with short sequences. Could anyone please tell me how I should continue and why?

Thank you very much!

genome submission bacteria • 1.9k views
ADD COMMENT
1
Entering edit mode

Note that submitting low-quality data to public databases makes life harder in perpetuity for everyone using those databases. Please put a lot of effort into curating the data yourself prior to submission to ensure that the genomes are pure (uncontaminated), represent the correct species, and are as complete and contiguous as possible. NCBI has some automated checks to prevent low-quality submissions from degrading the databases, as you can see, but they are not foolproof. I suggest you study the matter a bit more before submitting anything.

ADD REPLY
2
Entering edit mode
6.8 years ago

Very short sequences are not informative and may easily be from a contamination (another unknown organism) - that's why these reads are so short, there was little supporting evidence for their validity. Hence it is a tradeoff between quality and quantity.

It is is perfectly fine to not include or make use these short sequences - it is usually for the better.

ADD COMMENT
1
Entering edit mode
6.8 years ago
vmicrobio ▴ 290

On what bacteria do you work? If you have a close reference genome, I would recommend that you do a consensus sequence

ADD COMMENT
0
Entering edit mode
6.8 years ago
olp123 ▴ 20

It's Kocuria. Do you have an easy to use program in mind (for a beginner)?

Thanks you all. I definitely will have to study more.

ADD COMMENT
0
Entering edit mode

You can use a tool such as PAGIT. It's a tool which gather together several softwares to do draft genome : abacas, image, icorn and ratt allowing you to do (after generating contigs sequences): scaffold building, gap closing, iterative mapping and genome annotations. See publication

ADD REPLY

Login before adding your answer.

Traffic: 2439 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6