Question: Help me choose an assembler (for something quite specific)
0
gravatar for maxwhjohn1988
11 days ago by
maxwhjohn198840 wrote:

I have a de novo assembly, produced with w2rap-contigger (a forked version of Discovar DeNovo, made at the Earlham Institute) from a single PE library. The assembly is not great (N50 ~ 55 kb) but it's not totally useless for me.

I am fishing for contigs of interest, and I am getting some good results. However, I don't know whether the contigs of interest are from the same chromosome(s) or whether they are scattered about the genome.

I am in the process of applying for some money to do some 10x Genomics sequencing, but while I wait I want to make use of some old MiSeq PE data which has been sitting around gathering dust. The MiSeq reads were never good enough to produce a decent assembly on their own, but I thought it is conceivable that they could improve my existing assembly.

I want to provide the contigs from the existing assembly, plus the MiSeq PE reads, as input to an assembler and see if I can improve anything. Discovar DeNovo (and w2rap) are designed for a single library, so I doubt it would be sensible to use it again for this. I was considering SPAdes, as it can do hybrid assembly, but that would require specifying the existing contigs as PacBio or Nanopore reads - presumably there would be some kind of error-correction applied as a result of doing that, which sounds undesirable.

Has anyone got any clever ideas, or solid reasons why this would be a waste of my time?

next-gen assembly genome • 155 views
ADD COMMENTlink modified 9 days ago by colindaven340 • written 11 days ago by maxwhjohn198840
1
gravatar for h.mon
10 days ago by
h.mon9.1k
Brazil
h.mon9.1k wrote:

I was considering SPAdes, as it can do hybrid assembly, but that would require specifying the existing contigs as PacBio or Nanopore reads - presumably there would be some kind of error-correction applied as a result of doing that, which sounds undesirable.

You are incorrect, SPAdes version 3.11.1 (and some earlier version as well, but I don't know since when) accepts contigs from other assemblers as input, see the parameters --trusted-contigs and --untrusted-contigs.

ADD COMMENTlink written 10 days ago by h.mon9.1k

Thanks for the correction! You are quite correct, I had completely forgotten about that option. Much appreciated, I might give this a try.

ADD REPLYlink written 6 days ago by maxwhjohn198840
1
gravatar for colindaven
9 days ago by
colindaven340
colindaven340 wrote:

I would suggest you combine the PE miseq reads using flash or similar.

Then put all contigs into Soapdenovo2, it 's quite easy, fast and flexible.

Having said that, I don't think you will improve your asm too much. To check which chromosome each contig is from perhaps mapping to a related well sequenced related species might help you out ?

ADD COMMENTlink written 9 days ago by colindaven340

Thanks - yes, I'm also dubious about whether I will be able to make any improvements.

I am currently trying to get outputs from nucmer in an intelligible graphical format, after doing exactly what you suggest :)

ADD REPLYlink written 6 days ago by maxwhjohn198840

I find the dotplots program in Ugene to be excellent for comparing lots of different contigs rapidly. It is also easy to adjust parameters in.

ADD REPLYlink written 6 days ago by colindaven340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1373 users visited in the last hour