Question

Help me choose an assembler (for something quite specific)

0

Entering edit mode

7.8 years ago

maxwhjohn1988 ▴ 130

I have a de novo assembly, produced with w2rap-contigger (a forked version of Discovar DeNovo, made at the Earlham Institute) from a single PE library. The assembly is not great (N50 ~ 55 kb) but it's not totally useless for me.

I am fishing for contigs of interest, and I am getting some good results. However, I don't know whether the contigs of interest are from the same chromosome(s) or whether they are scattered about the genome.

I am in the process of applying for some money to do some 10x Genomics sequencing, but while I wait I want to make use of some old MiSeq PE data which has been sitting around gathering dust. The MiSeq reads were never good enough to produce a decent assembly on their own, but I thought it is conceivable that they could improve my existing assembly.

I want to provide the contigs from the existing assembly, plus the MiSeq PE reads, as input to an assembler and see if I can improve anything. Discovar DeNovo (and w2rap) are designed for a single library, so I doubt it would be sensible to use it again for this. I was considering SPAdes, as it can do hybrid assembly, but that would require specifying the existing contigs as PacBio or Nanopore reads - presumably there would be some kind of error-correction applied as a result of doing that, which sounds undesirable.

Has anyone got any clever ideas, or solid reasons why this would be a waste of my time?

Assembly next-gen genome • 2.1k views

ADD COMMENT • link updated 7.8 years ago by colindaven 7.7k • written 7.8 years ago by maxwhjohn1988 ▴ 130

score 1 · Answer 1 · 2017-10-12

1

Entering edit mode

7.8 years ago

h.mon 35k

I was considering SPAdes, as it can do hybrid assembly, but that would require specifying the existing contigs as PacBio or Nanopore reads - presumably there would be some kind of error-correction applied as a result of doing that, which sounds undesirable.

You are incorrect, SPAdes version 3.11.1 (and some earlier version as well, but I don't know since when) accepts contigs from other assemblers as input, see the parameters --trusted-contigs and --untrusted-contigs.

ADD COMMENT • link 7.8 years ago by h.mon 35k

0

Entering edit mode

Thanks for the correction! You are quite correct, I had completely forgotten about that option. Much appreciated, I might give this a try.

ADD REPLY • link 7.7 years ago by maxwhjohn1988 ▴ 130

score 1 · Answer 2 · 2017-10-13

1

Entering edit mode

7.8 years ago

colindaven 7.7k

I would suggest you combine the PE miseq reads using flash or similar.

Then put all contigs into Soapdenovo2, it 's quite easy, fast and flexible.

Having said that, I don't think you will improve your asm too much. To check which chromosome each contig is from perhaps mapping to a related well sequenced related species might help you out ?

ADD COMMENT • link 7.8 years ago by colindaven 7.7k

0

Entering edit mode

Thanks - yes, I'm also dubious about whether I will be able to make any improvements.

I am currently trying to get outputs from nucmer in an intelligible graphical format, after doing exactly what you suggest :)

ADD REPLY • link 7.7 years ago by maxwhjohn1988 ▴ 130

0

Entering edit mode

I find the dotplots program in Ugene to be excellent for comparing lots of different contigs rapidly. It is also easy to adjust parameters in.

ADD REPLY • link 7.7 years ago by colindaven 7.7k