Question: Choosing de novo genome assembly
0
gravatar for s.kyungyong64
9 months ago by
Berkeley, USA
s.kyungyong640 wrote:

Hi,

I have Ren-seq Data of plant (tomato) from PacBio ( ~420 Mb in fasta) to assemble. The assembled genome is about 2.0Mb in size. I have currently tried Genious and Canu assembly. The result from Canu was better than Genious, but I think I may have to try some other software. Do you have any recommendations that might worth trying?

Thanks

rna-seq assembly genome • 435 views
ADD COMMENTlink modified 8 months ago by Roxane Boyer430 • written 9 months ago by s.kyungyong640

You have a tiny amount of data, so clearly this is not the raw output of a smartcell. Can you describe it in more detail? Are these CCS reads, or consensus after doing correction, or what?

ADD REPLYlink written 9 months ago by Brian Bushnell14k

Thanks for catching that. I corrected it. It is the output of SMRT cell

ADD REPLYlink written 9 months ago by s.kyungyong640
2
gravatar for Roxane Boyer
8 months ago by
Roxane Boyer430
France/Marseille/IBDM
Roxane Boyer430 wrote:

Hi !

I have also performed genome assembly using PacBio data and Canu assembler, and I was really satisfied of it.

If you want to try something else, you can try Falcon assembler, proposed by PacificBiosciences ( https://github.com/PacificBiosciences/FALCON ). Falcon is aiming to output a diploid assembly, where heterogeneous regions of the genome are outputted in a different file. I'm just warning you that PacBio tools are actually being deeply changed (they want to leave the bas/bax//cmp.h5 files extensions to propose classic fasta/sam/bam files.

The tools from PacBio, where Falcon belong, are quiet complicated to install. The two classic way are to download from github all the dependencies by yourself (hard way), or to use they tool called pitchfork (but I won't recommend you that, PacBio engineer themselves call that "the painfull way"...).

If you want to use PacBio tools in command line, I recommend to follow theses steps I have recommended to someone else (who was struggling on installation) on github : https://github.com/PacificBiosciences/pbalign/issues/67#issuecomment-272964848

As your genome is small enough( 2Mb that's it ?), you can also try assembly through SMRT Portal using for example HGAP 3 protocol.

pbalign and quiver are very important, because with PacBio assembly, the error rate after assembly is still around 1%. You can lower this error rate using your raw reads, this step is called polishing. You can use pbalign + quiver for that.

If you have some questions about polishing or tools installation, I can help you, I've been through the same steps !

Good luck,

Roxane

ADD COMMENTlink modified 8 months ago • written 8 months ago by Roxane Boyer430

Great post! But, can I ask you to clarify this line:

"I'm just warning you that PacBio tools are actually being deeply changed (they want to leave the bas/bax//cmp.h5 files extensions to propose classic fasta/sam/bam files."

It's not quite clear what it means - whether PacBio is migrating to or from which formats. Thanks!

ADD REPLYlink written 8 months ago by Brian Bushnell14k
1

Latest PacBio software (SMRT* v.3.0 on) produces BAM file(s) as output. Doc_1, Doc_2, Doc_3

PacBio does

make use of the extensibility mechanisms of the BAM specification to encode PacBio-specific information

ADD REPLYlink modified 8 months ago • written 8 months ago by genomax37k

Oh, nice! That's much less annoying than h5.

ADD REPLYlink written 8 months ago by Brian Bushnell14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1666 users visited in the last hour