Entering edit mode
6.1 years ago
julienlevy
•
0
Hello,
I am doing a genome assembly I have run CANU and I am not sure what to look for in the output of CANU. I got (198.87 times coverage) (1) How do I evaluate my CANU output?
and this is my read mere (which seem small?)
Read length histogram (one '*' equals 48989.4 reads):
-- 0 4999 3429258 **********************************************************************
-- 5000 9999 2856365 **********************************************************
-- 10000 14999 2424194 *************************************************
-- 15000 19999 919354 ******************
-- 20000 24999 341740 ******
-- 25000 29999 120067 **
I am going to do a polishing step with pbalign. I also have a transcriptome available that I am planing to blast to my assembly.
(2 )What is the best tool to use the transcriptome to improve the assembly / annotate the genes ? (3) What should I do next ?
thanks
Some threads to consider looking through:
What can I do after my Pacbio genome assembly ?
Polish PacBio assembly with latest PacBio tools : an affordable solution for everyone
Not sure what is the size of the genome you are working with but MAKER or MAKER-P can be used for annotation.
Thanks, the genome is 450gb
Are you sure? Is it really 150x the human genome? Insn't it 450Mb?
You know, H. sapiens is not really the big dude when it comes to genome size :P
How come? We are the most complex organism ever created, we surely have the biggest genome, with the most genes, don't we?
On a serious note, though, I just google for
biggest genome
and was flabbergasted to discover genomes in the 150-250 billions base pairs. I was stuck at the loblolly pine 22Gb genome.But 200x coverage of a 450Gb genome is... a whole lot of sequencing data o.O
And bigger than Paris japonica, the biggest genome currently known. So either OP made a mistake or is working on the biggest meanest most impressive genome ever.