Question: Combined assembly analysis (short reads + long reads)
2
gravatar for XC
3.3 years ago by
XC20
Germany
XC20 wrote:

Dear NGS Experts,

I have a question about combined genome assembly.

We have 75X Hiseq sequencing of an animal species genome (about 3Gb genome size) together with 50X Pacbio Sequel system, now, we would like to make a combined assembly analysis of these 350Gb data. Anybody knows any tools for this kind of analysis?

Many thanks.

sequencing assembly genome • 1.8k views
ADD COMMENTlink modified 3.3 years ago by shwethacm200 • written 3.3 years ago by XC20

With 50x PacBio data you should be able assemble that on its own (provided it is good quality). Based on PacBio's recommendation that should be enough to do a good assembly. You can try to assemble the HiSeq data independently and then see if you can combine the two later.

Can you comment on what the sequel data looks like? There is a dearth of real datasets for Sequel.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by genomax70k

Hi genomax2, thanks a lot for your quick answer. We have similar workflow plan. If there is a tool which can do assembly at same time, that would be great, because shorter reads can correct the errors on the long reads to make them more reliable.

We are waiting for the sequel data from sequencer, once we got them, we can try to make comment.

Thank you again.

ADD REPLYlink written 3.3 years ago by XC20
1

FALCON is one option. I think this was used for gorilla genome recently. There are plenty of other options on the Wiki page I had linked in the previous post.
Since you are going to have plenty of PacBio data you may not need to error correct using Illumina (not finding the post from Dr. Hall from PacBio but will update if I do).
Is this a diploid genome?

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by genomax70k

Is there update about the Sequel data? The quality and price?

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by pengchy410
2
gravatar for Rohit
3.3 years ago by
Rohit1.4k
California
Rohit1.4k wrote:

From my own experience with a diploid animal genome, error correction of PacBio with Illumina takes time and resources. Proovread worked well for our data at lower coverages (15X Pacbio, 20X HiSeq) but it demands many nodes, 1800 in our case with run-time of 4 days per node - But it is worth the wait, the developer too (Thomas Hackl) is very responsive.

CANU works well for a combined approach, from what I have heard.

At 50X PacBio you could go for self-error correction (PBcR) and then use Quiver to polish the genome with PacBio data alone. In the end, you could use the HiSeq data to finish the genome with the Pilon pipeline.

ADD COMMENTlink written 3.3 years ago by Rohit1.4k
1
gravatar for Biomonika (Noolean)
3.3 years ago by
State College, PA, USA
Biomonika (Noolean)3.1k wrote:

I recently have heard good reviews about CANU.

ADD COMMENTlink written 3.3 years ago by Biomonika (Noolean)3.1k
0
gravatar for shwethacm
3.3 years ago by
shwethacm200
Seattle, WA
shwethacm200 wrote:

Falcon for denovo assembly + Quiver for base error correction is a good combination. I haven't tried the other approaches, but they constantly come up when we do assemblies.

You can also do error correction of the PacBio reads using Illumina and then assemble using Celera. https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/pacBioToCA Although I think the reverse (like Rohit mentioned - assemble first, then do Pileon) is a more popular choice.

Here's a question: Do you have mate pair data? Most denovo PacBio assemblers give you contigs that you can place into scaffolds if you have mate pairs.

ADD COMMENTlink written 3.3 years ago by shwethacm200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 623 users visited in the last hour