Question: Choosing Optimal Assembly from Quasta Data
gravatar for RichelleRedekop
5.4 years ago by
RichelleRedekop0 wrote:

Hello Everyone,

I am very new at next generation sequencing and have a question about choosing which assembly is the best to use going forward. My samples are isolates of Helicobacter pylori that I have sent for whole genome sequencing. I have paired end Illumina reads and used Trimmomatic to process them and FastQC to make sure everything appeared acceptable. I then tried deNovo assembly of the forward and reverse paired reads using Velvet, Abyss and SPAdes. I then took the contig files produced from these 3 assembly methods and ran them through Quast to evaluate which assembly worked the best. I have attached links to the alignment produced and the summary file.


Summary File

Abyss and Spades had similar output, with SPAde perhaps being marginally better based on # contigs, largest contig, and N50. Velvet was quite different from Abyss and SPAde and had much fewer misassemblies (6 vs 35 for AByss and 31 for SPAdes). I am not sure what would account for this large difference.

If anyone could point me in the right direction as to which assembly is the best to use and/or how to improve my assemblies I would really appreciate it. Like I said I am super to to NGS and have limited computing skills so this has been a huge learning experience for me (but a fun one!).

Thanks in advance!

next-gen assembly • 1.6k views
ADD COMMENTlink modified 5.4 years ago by h.mon32k • written 5.4 years ago by RichelleRedekop0

Did you use the same minimum contig length threshold for all three assemblies? If not, the number of contigs and N50 value are pretty much meaningless..

ADD REPLYlink written 5.4 years ago by 5heikki9.2k

Yes I used the same minimum contig length of 200 for all three assemblies.

ADD REPLYlink written 5.4 years ago by RichelleRedekop0
gravatar for h.mon
5.4 years ago by
h.mon32k wrote:

Genome assembly and assembly evaluation are not easy tasks, even for microbial genomes. The list of suggestions is also large, and it depends on what you want to do with the genomes; and how much effort and time (and money) you are willing to invest. Also, you omit a lot of information, such as: what do you want from your genomes, how many isolates, depth of sequencing, insert size, read length, etc. As for suggestions:

Wet lab: with some primers and PCRs, you could check if the misassemblies are "true" misassemblies (i.e., assembly artifacts) or "false" (i.e., evaluation artifacts). Also, primers and PCRs could help you to scaffold the contigs. Depending on how badly you need a closed and (quasi-)complete genome, you could do some PacBio or mate pair + Illumina sequencing - this is expensive, though.

Evaluation: try some reference-free evaluations, such as ALE, FRCbam or CheckM. Map reads back to draft genomes, get insert size, coverage, mapping statistics, etc.

Assembly: try A5_MiSeq, I got good results with it for some microbial genomes. I think you should use only contigs > 500bp, the smaller contigs are frequently noise / contamination - you could try blasting or mapping them to a reference genome to see if they make sense.


ADD COMMENTlink written 5.4 years ago by h.mon32k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1299 users visited in the last hour