Hello Everyone
I have query regarding tool for de novo assembly of PacBio data. I have plant genome data at 52X coverage. The genome size is around 500 mb
I used HGAP (RS_HGAP_assembly2, RS_HGAP_assembly3, RS_preassembly 2) tool through SMRT portal it giving me error of "ERROR! Reading fasta files greater than 4Gbytes is not supported" . It is not supporting large gnome size
Then I used falcon it run successfully but in assembly folder the all files are empty except preads.ovl, 2-asm-falcon/run_falcon_asm.sh.log
Can you please suggest me tool for assembly for data having 52x coverage and predicted genome size is 500mb.
Thank you
Thank you for kind reply,
I reran falcon on whole data. It ran successfully, but the output is in kb
preads.db logConfig file
How many bases are in the
1-preads_ovl/preads4falcon.fastafile? Two things stand out as being things to change, if thepreads4falcon.fastafile does not have >15x of the expected genome size, then thelength_cutoffandlength_cutoff_prparameters should be decreased, this will be dependent on your library quality and subread size. The second parameter that needs to be changed is the--min_covin theoverlap_filtering_settingI would set it at 2.Hey thanks I got your point. Now I will rerun process with following parameter just tell me they are good to go.
length_cutoff= 500length_cutoff_pr= 2500--min_cov 20after completing process with above parameter I will get back to you.
And one more thing If you have any good reading material regarding falcon parameter for diploid genome please let me know
Thank you
I tried with new parameter Now I got below error
1)
2)
500 and 2500 are too low for the cutoffs, this should be calculated as the sequence length for which ~30x of you expected genome size is covered.