Question

Can I assemble a bunch of contigs of a bacterial genome to get the full genome sequence?

0

Entering edit mode

5.0 years ago

AnonymousEngineer • 0

Hi everyone,

I am fairly new to bioinformatics and I am trying to wrap my head around genome sequencing and assembly. I need to analyze this genome of this bacteria called Shewanella benthica KT99 to predict genes encoded in its genome. However, the reported assembly level of the genome is in contig level. I understand that contigs are fragments of the genome for which the order of the bases is known to be correct. However I don't quite understand why the authors of the paper reporting the draft genome sequence could not assemble it to a genome assembly level of complete genome. I have attached the information I got from NCBI regarding the bacteria. On another note, there is a very closely related bacteria called Shewanella piezotolerans WP3 which has a genome assembly level of "complete genome" on NCBI. Both were sequences using ABI 3730 family DNA sequencers, and were separated by only a year or two. Why are the assembly levels different? Below are the details of my bacteria of interest.

So far I have used an established pipeline to work with complete genome sequences of bacteria. So is there a way to just take all these contigs and assemble them together to obtain the full genome this bacteria? If so, how do I do it, and what software packages (open source) do I use to do that?

Thank you in advance!

Anby

*ASM17207v1

Organism name: Shewanella benthica KT99 (g-proteobacteria)

Infraspecific name: Strain: KT99

BioSample: SAMN02436096Bio

Project: PRJNA13387Submitter: The Gordon and Betty Moore Foundation Marine Microbiology Initiative

Date: 2007/11/28

Assembly level: Contig

Genome representation: full

RefSeq category: representative genome

GenBank assembly accession: GCA_000172075.1 (latest)

RefSeq assembly accession: GCF_000172075.1 (latest)

RefSeq assembly and GenBank assembly identical: yes

WGS Project: ABIC01*

assembly sequencing alignment sequence • 1.3k views

ADD COMMENT • link 5.0 years ago by AnonymousEngineer • 0

0

Entering edit mode

Thank you for the explanations, I appreciate it. I will try contacting the authors for the original sequencing reads are available. What would be a good program of choice when it comes to assembly?

On another note, can the sequencing coverage and / or quality be improved by repeating the sequencing process multiple times?

Anby

ADD REPLY • link 5.0 years ago by AnonymousEngineer • 0

0

Entering edit mode

Maybe. But short reads can only do so much to resolve long repeats. Paired end data will help more than single end reads.

ADD REPLY • link 5.0 years ago by swbarnes2 14k

score 1 · Answer 1 · 2019-05-07

I need to analyze this genome of this bacteria called Shewanella benthica KT99 to predict genes encoded in its genome

The genome has been annotated by NCBI ( ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/172/075/GCF_000172075.1_ASM17207v1 ) and Ensembl ( http://bacteria.ensembl.org/Shewanella_benthica_kt99/Info/Index ), so you can just use one of those annotations instead of doing a new one.

So is there a way to just take all these contigs and assemble them together to obtain the full genome this bacteria?

Most likely no. If the original sequencing reads are available, you can try to assemble again with a different program. Trying to assemble just the contigs either won't improve the assembly, and may even introduce misassemblies.

However I don't quite understand why the authors of the paper reporting the draft genome sequence could not assemble it to a genome assembly level of complete genome.

The reason could be technical (low sequencing coverage and / or bad sequencing quality, etc), or could be biological (the genome has repeats longer than read length). Or something else. There is not enough information to reach a conclusion.

score 1 · Answer 2 · 2019-05-07

In general, one of the things that keeps contigs from being joined is repetitive elements. (Imagine trying to assemble a chopped-up version of "The Raven") This probably isn't something that can be fixed if all you have at your disposal are short reads. But if coding genes are uninterrupted, the assembly might be good enough for your purposes.