How To Improve Whole Genome Assembly Quality
4
1
Entering edit mode
10.6 years ago
HG ★ 1.2k

Hi everyone, I am new in sequence assembly. I have stared a project of 50 ecoli whole genome sequencing illumina data set. I did all the assembly using Spades and quality checking by Quast. On an average i got around 100 contig of each genome. Can anyone suggest me how to improve the assembly quality, i mean how to reduce the contig number , increase the N50 value , reduce the gap?

Thank you advance for any suggestion.

• 8.5k views
ADD COMMENT
0
Entering edit mode

Why don't you just map reads to reference and call variants?

ADD REPLY
1
Entering edit mode
10.6 years ago
5heikki 11k

Do multiple assemblies with different kmer settings and merge them in the end for a final assembly. JGI has a decent pipeline for this, and it should be publicly available, though I couldn't locate any url following good 10 seconds in Google..

ADD COMMENT
0
Entering edit mode

Spades does this already, although "only" with three values of K by default.

ADD REPLY
1
Entering edit mode
10.6 years ago
Stroehli ▴ 40

I don't know how the scaffolding step in Spades works, but maybe trying an additional stand-alone scaffolder like SSPACE (using paired-end information) or Scaffold_builder (using a completed genome as a reference) could help. The latter should be relatively straight-forward for E.coli as a good genomic reference is available. By this you can get a better genomic structure (longer scaffolds, right order of scaffolds) which can also help in reducing gaps.

In addition to that, it is never a bad idea to analyze your data set with a bunch of different assemblers that are out there.

ADD COMMENT
0
Entering edit mode

Yes i appreciate your suggestion. After assembly i used contiguator to map all the contig with a good closed reference genome and i took only map contig to make a final pseudogenome . Any comment please about my approach.

ADD REPLY
0
Entering edit mode
8.7 years ago

Did you perform any read quality trimming before the Spades assembly? For Ecoli I do not know of a better assembler than Spades, we usually get quite decent results with it. Do you run Spades with the additional BWA after initial assembly?

Also, depending on what you want to know, I would not advise to map against a reference. If you want to detect virulence genes, resistance genes, plasmids and you map against a reference, you will only detect those that are also present in the reference. Ecoli has a very 'mobile' genome with a lot of recombination, horizontal gene transfer and exchange of plasmids going on. If you want a low resolution phylogenetic relationship between your strains, mapping against a reference is a good approach, but for functional analysis, it is a definite no no.

ADD COMMENT
0
Entering edit mode

Spades no need quality trimming before run. It can do by default. Yes new version of Spades included BWA.

ADD REPLY
0
Entering edit mode
8.7 years ago

There are trusted E.coli genomes you can use to compare and move/order your assembled contigs. You can do it using a program like Mauve. There are tutorials showing how to do it

In my hands, and having a coverage of 100X in E. coli sequences, I got as many contigs that you got, and that even doing a nice trimming of the sequences by quality and getting rid of putative adaptors sequences

I think you need to test different k-mer values, compare each with a trusted genome, and if not satisfied, use different sequences like mate-paired sequences, long illumina sequences and even long sequences obtained through PacBio. A colleague of mine tried to close a Pseudomonas genome for 7 years without a full success, and eventually it made it using pacBio

ADD COMMENT
0
Entering edit mode

Spades already use different k-mer values, by default. I appreciate your PacBio sequencing approach we also followed such a technique now.

ADD REPLY

Login before adding your answer.

Traffic: 1231 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6