Question: Best tools for genome assembly with paired-end, mate pair and linked read (10x) data
0
gravatar for hirad.alipanah
7 months ago by
hirad.alipanah0 wrote:

Hi everyone. We have the following table for our data: Data Table

We want to assemble the genome of the planaria "dugesia japonica". Its genome is approximately 1.5Gbp, diploid and very repetitive (similar to Smed.) What is the best assembler that you suggest for assembling all of this data?

assembly genome • 374 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by hirad.alipanah0

Could you clarifiy on which species you want to perform the assembly ?

ADD REPLYlink written 7 months ago by Nicolas Rosewick7.6k

Yes. the planaria "dugesia japonica"

ADD REPLYlink written 7 months ago by hirad.alipanah0
1

Could you edit your question to add these information + expect size of genome + ploidy , etc... Thanks

ADD REPLYlink written 7 months ago by Nicolas Rosewick7.6k

10x data may need to be handled separately. supernova is what you would want to use there. I see some data from GAII which would lead me to believe that you have collected different data over time. Are all these datasets for the same exact sample/organism?

ADD REPLYlink written 7 months ago by genomax67k

Yeah, that was our first option. But how do we use the output of supernova for our other data? Yes, the datasets are from the same exact organism. But they are from different samples.

ADD REPLYlink written 7 months ago by hirad.alipanah0

I can recommend ABySS , very versatile, excellent cluster usage and quite performant, it might require some parameter tweaking though (as with most assembly software). From the same developers there are also tools to include the 10x data.

ADD REPLYlink written 7 months ago by lieven.sterck4.7k

Yes. We've considered that, too. But it needs a lot of memory (around 1TB but we only have 500GB.) Do you know other assemblers that require less memory?

ADD REPLYlink written 7 months ago by hirad.alipanah0

perhaps soapDeNovo is an option ? (no experience with myself though). Masurca will likely also be too mem intensive. I think in most cases you still need to figure out how to include the 10x as there are very few (to none?) software that will be able to process all your data at once.

ADD REPLYlink written 7 months ago by lieven.sterck4.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1421 users visited in the last hour