Question: PacBio Genome assembly
0
gravatar for julienlevy
18 months ago by
julienlevy0
julienlevy0 wrote:

Hello,

I am doing a genome assembly I have run CANU and I am not sure what to look for in the output of CANU. I got (198.87 times coverage) (1) How do I evaluate my CANU output?

and this is my read mere (which seem small?)

Read length histogram (one '*' equals 48989.4 reads):
--        0   4999 3429258 **********************************************************************
--     5000   9999 2856365 **********************************************************
--    10000  14999 2424194 *************************************************
--    15000  19999 919354 ******************
--    20000  24999 341740 ******
--    25000  29999 120067 **

I am going to do a polishing step with pbalign. I also have a transcriptome available that I am planing to blast to my assembly.

(2 )What is the best tool to use the transcriptome to improve the assembly / annotate the genes ? (3) What should I do next ?

thanks

ADD COMMENTlink modified 18 months ago by genomax72k • written 18 months ago by julienlevy0

Some threads to consider looking through:
What can I do after my Pacbio genome assembly ?
Polish PacBio assembly with latest PacBio tools : an affordable solution for everyone

Not sure what is the size of the genome you are working with but MAKER or MAKER-P can be used for annotation.

ADD REPLYlink modified 18 months ago • written 18 months ago by genomax72k

Thanks, the genome is 450gb

ADD REPLYlink modified 18 months ago • written 18 months ago by julienlevy0
1

Are you sure? Is it really 150x the human genome? Insn't it 450Mb?

ADD REPLYlink written 18 months ago by h.mon27k

You know, H. sapiens is not really the big dude when it comes to genome size :P

ADD REPLYlink written 18 months ago by cschu1811.8k

How come? We are the most complex organism ever created, we surely have the biggest genome, with the most genes, don't we?

On a serious note, though, I just google for biggest genome and was flabbergasted to discover genomes in the 150-250 billions base pairs. I was stuck at the loblolly pine 22Gb genome.

ADD REPLYlink written 18 months ago by h.mon27k

But 200x coverage of a 450Gb genome is... a whole lot of sequencing data o.O

And bigger than Paris japonica, the biggest genome currently known. So either OP made a mistake or is working on the biggest meanest most impressive genome ever.

ADD REPLYlink modified 18 months ago • written 18 months ago by WouterDeCoster41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2046 users visited in the last hour