in my previous post (, some people suggested that it's better to correct errors in log reads from PacBio sequel using short reads from illumina ahead of the genome assembly.

I happened to find that the assembly with only short reads contains lots of mis-assembled loci. The assembled sequences did not match FISH data. Currently we suspect that the extremely similar repeated sequences dispersed among the genome caused the mis-assembly.

In this case, I felt that error-correcting software would not work well like LoRDEC or HALC that needs the assembly derived from short reads as input. Is this right?

If so, what kind of software is better to correct errors?

What is the scientific/biological question you are looking for an answer to?

Depending on your question there are different initial error correction bfx pipelines that are reccomeded for PacBio data.

Did anyone use FMLRC ( or is HALC better?

The build-in error correction procedure of the CANU pipeline works quite well. That is however not using illumina data but only PacBio.

