Question: Help me choose an assembler (for something quite specific)
0
gravatar for maxwhjohn1988
9 months ago by
maxwhjohn198840 wrote:

I have a de novo assembly, produced with w2rap-contigger (a forked version of Discovar DeNovo, made at the Earlham Institute) from a single PE library. The assembly is not great (N50 ~ 55 kb) but it's not totally useless for me.

I am fishing for contigs of interest, and I am getting some good results. However, I don't know whether the contigs of interest are from the same chromosome(s) or whether they are scattered about the genome.

I am in the process of applying for some money to do some 10x Genomics sequencing, but while I wait I want to make use of some old MiSeq PE data which has been sitting around gathering dust. The MiSeq reads were never good enough to produce a decent assembly on their own, but I thought it is conceivable that they could improve my existing assembly.

I want to provide the contigs from the existing assembly, plus the MiSeq PE reads, as input to an assembler and see if I can improve anything. Discovar DeNovo (and w2rap) are designed for a single library, so I doubt it would be sensible to use it again for this. I was considering SPAdes, as it can do hybrid assembly, but that would require specifying the existing contigs as PacBio or Nanopore reads - presumably there would be some kind of error-correction applied as a result of doing that, which sounds undesirable.

Has anyone got any clever ideas, or solid reasons why this would be a waste of my time?

next-gen assembly genome • 394 views
ADD COMMENTlink modified 9 months ago by colindaven730 • written 9 months ago by maxwhjohn198840
1
gravatar for h.mon
9 months ago by
h.mon16k
Brazil
h.mon16k wrote:

I was considering SPAdes, as it can do hybrid assembly, but that would require specifying the existing contigs as PacBio or Nanopore reads - presumably there would be some kind of error-correction applied as a result of doing that, which sounds undesirable.

You are incorrect, SPAdes version 3.11.1 (and some earlier version as well, but I don't know since when) accepts contigs from other assemblers as input, see the parameters --trusted-contigs and --untrusted-contigs.

ADD COMMENTlink written 9 months ago by h.mon16k

Thanks for the correction! You are quite correct, I had completely forgotten about that option. Much appreciated, I might give this a try.

ADD REPLYlink written 9 months ago by maxwhjohn198840
1
gravatar for colindaven
9 months ago by
colindaven730
Hannover Medical School
colindaven730 wrote:

I would suggest you combine the PE miseq reads using flash or similar.

Then put all contigs into Soapdenovo2, it 's quite easy, fast and flexible.

Having said that, I don't think you will improve your asm too much. To check which chromosome each contig is from perhaps mapping to a related well sequenced related species might help you out ?

ADD COMMENTlink written 9 months ago by colindaven730

Thanks - yes, I'm also dubious about whether I will be able to make any improvements.

I am currently trying to get outputs from nucmer in an intelligible graphical format, after doing exactly what you suggest :)

ADD REPLYlink written 9 months ago by maxwhjohn198840

I find the dotplots program in Ugene to be excellent for comparing lots of different contigs rapidly. It is also easy to adjust parameters in.

ADD REPLYlink written 9 months ago by colindaven730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1502 users visited in the last hour