Choosing Optimal Assembly from Quasta Data
1
0
Entering edit mode
10.1 years ago

Hello Everyone,

I am very new at next generation sequencing and have a question about choosing which assembly is the best to use going forward. My samples are isolates of Helicobacter pylori that I have sent for whole genome sequencing. I have paired end Illumina reads and used Trimmomatic to process them and FastQC to make sure everything appeared acceptable. I then tried deNovo assembly of the forward and reverse paired reads using Velvet, Abyss and SPAdes. I then took the contig files produced from these 3 assembly methods and ran them through Quast to evaluate which assembly worked the best. I have attached links to the alignment produced and the summary file.

Alignment:

https://drive.google.com/file/d/0B1G2M5ad3_x4Vm0tbzBhMGJSVVk/view?usp=sharing

Summary File

https://drive.google.com/file/d/0B1G2M5ad3_x4amtFc3RiNkxRMVE/view?usp=sharing

Abyss and Spades had similar output, with SPAde perhaps being marginally better based on # contigs, largest contig, and N50. Velvet was quite different from Abyss and SPAde and had much fewer misassemblies (6 vs 35 for AByss and 31 for SPAdes). I am not sure what would account for this large difference.

If anyone could point me in the right direction as to which assembly is the best to use and/or how to improve my assemblies I would really appreciate it. Like I said I am super to to NGS and have limited computing skills so this has been a huge learning experience for me (but a fun one!).

Thanks in advance!

Assembly next-gen • 2.4k views
ADD COMMENT
0
Entering edit mode

Did you use the same minimum contig length threshold for all three assemblies? If not, the number of contigs and N50 value are pretty much meaningless..

ADD REPLY
0
Entering edit mode

Yes I used the same minimum contig length of 200 for all three assemblies.

ADD REPLY
1
Entering edit mode
10.1 years ago
h.mon 35k

Genome assembly and assembly evaluation are not easy tasks, even for microbial genomes. The list of suggestions is also large, and it depends on what you want to do with the genomes; and how much effort and time (and money) you are willing to invest. Also, you omit a lot of information, such as: what do you want from your genomes, how many isolates, depth of sequencing, insert size, read length, etc. As for suggestions:

Wet lab: with some primers and PCRs, you could check if the misassemblies are "true" misassemblies (i.e., assembly artifacts) or "false" (i.e., evaluation artifacts). Also, primers and PCRs could help you to scaffold the contigs. Depending on how badly you need a closed and (quasi-)complete genome, you could do some PacBio or mate pair + Illumina sequencing - this is expensive, though.

Evaluation: try some reference-free evaluations, such as ALE, FRCbam or CheckM. Map reads back to draft genomes, get insert size, coverage, mapping statistics, etc.

Assembly: try A5_MiSeq, I got good results with it for some microbial genomes. I think you should use only contigs > 500bp, the smaller contigs are frequently noise / contamination - you could try blasting or mapping them to a reference genome to see if they make sense.

ADD COMMENT

Login before adding your answer.

Traffic: 4284 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6