Choosing de novo genome assembly
1
1
Entering edit mode
4.9 years ago

Hi,

I have Ren-seq Data of plant (tomato) from PacBio ( ~420 Mb in fasta) to assemble. The assembled genome is about 2.0Mb in size. I have currently tried Genious and Canu assembly. The result from Canu was better than Genious, but I think I may have to try some other software. Do you have any recommendations that might worth trying?

Thanks

RNA-Seq genome Assembly • 2.2k views
ADD COMMENT
0
Entering edit mode

You have a tiny amount of data, so clearly this is not the raw output of a smartcell. Can you describe it in more detail? Are these CCS reads, or consensus after doing correction, or what?

ADD REPLY
0
Entering edit mode

Thanks for catching that. I corrected it. It is the output of SMRT cell

ADD REPLY
2
Entering edit mode
4.9 years ago
Rox ★ 1.4k

Hi !

I have also performed genome assembly using PacBio data and Canu assembler, and I was really satisfied of it.

If you want to try something else, you can try Falcon assembler, proposed by PacificBiosciences ( https://github.com/PacificBiosciences/FALCON ). Falcon is aiming to output a diploid assembly, where heterogeneous regions of the genome are outputted in a different file. I'm just warning you that PacBio tools are actually being deeply changed (they want to leave the bas/bax//cmp.h5 files extensions to propose classic fasta/sam/bam files.

The tools from PacBio, where Falcon belong, are quiet complicated to install. The two classic way are to download from github all the dependencies by yourself (hard way), or to use they tool called pitchfork (but I won't recommend you that, PacBio engineer themselves call that "the painfull way"...).

If you want to use PacBio tools in command line, I recommend to follow theses steps I have recommended to someone else (who was struggling on installation) on github : https://github.com/PacificBiosciences/pbalign/issues/67#issuecomment-272964848

As your genome is small enough( 2Mb that's it ?), you can also try assembly through SMRT Portal using for example HGAP 3 protocol.

pbalign and quiver are very important, because with PacBio assembly, the error rate after assembly is still around 1%. You can lower this error rate using your raw reads, this step is called polishing. You can use pbalign + quiver for that.

If you have some questions about polishing or tools installation, I can help you, I've been through the same steps !

Good luck,

Roxane

ADD COMMENT
0
Entering edit mode

Great post! But, can I ask you to clarify this line:

"I'm just warning you that PacBio tools are actually being deeply changed (they want to leave the bas/bax//cmp.h5 files extensions to propose classic fasta/sam/bam files."

It's not quite clear what it means - whether PacBio is migrating to or from which formats. Thanks!

ADD REPLY
1
Entering edit mode

Latest PacBio software (SMRT* v.3.0 on) produces BAM file(s) as output. Doc_1, Doc_2, Doc_3

PacBio does

make use of the extensibility mechanisms of the BAM specification to encode PacBio-specific information

ADD REPLY
0
Entering edit mode

Oh, nice! That's much less annoying than h5.

ADD REPLY

Login before adding your answer.

Traffic: 1124 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6