I am experimenting with that right now to determine whether Racon or Pilon polishing "is better" for an assembly where PacBio reads were used to fill in gaps. Briefly, I get similar assembly statistics, RNAseq mapping rates, indel rates (based on cigar strings and Illumina short-read alignment), and BUSCO scores with Racon and Pilon with a single round of polishing using either program.
Are you dealing with a large genome (> 2Gbp)? If so, running Pilon with default settings is trickier as it requires almost 1GB RAM per 1 MBp of sequence (so 2TB RAM by default for 2Gbp genome). I've written some scripts that generate custom "targets" files for each scaffold, so that Pilon runs with less RAM. Racon requires much less memory by default (~500-600GB) for 65x coverage of Illumina short-read data for a 2Gbp genome.
If you are concerned about removing indels, then the best way to determine if Racon or Pilon is better for your case is to run both, then predict proteins and run "ideel" (http://www.opiniomics.org/a-simple-test-for-uncorrected-insertions-and-deletions-indels-in-bacterial-genomes/) to see if you have a greater proportion of truncated proteins in the Racon or Pilon polished assembly. With Racon, you should be fine with one iteration of polishing. With Pilon, most people use two iterations, but I warn you that too many polishing iterations is a bad thing as I found a much higher proportion of truncated proteins with a Pilon assembly that was polished 8 rounds versus one that was polished 2 rounds. Your results may vary, but that is what I found with a 2Gbp mammalian genome and 65x Illumina short-read coverage using Pilon.
modified 20 months ago
20 months ago by
jean.elbers • 1.3k