I am experimenting with that right now to determine whether Racon or Pilon polishing "is better" for an assembly where PacBio reads were used to fill in gaps. Briefly, I get similar assembly statistics, RNAseq mapping rates, indel rates (based on cigar strings and Illumina short-read alignment), and BUSCO scores with Racon and Pilon with a single round of polishing using either program.
Are you dealing with a large genome (> 2Gbp)? If so, running Pilon with default settings is trickier as it requires almost 1GB RAM per 1 MBp of sequence (so 2TB RAM by default for 2Gbp genome). I've written some scripts that generate custom "targets" files for each scaffold, so that Pilon runs with less RAM. Racon requires much less memory by default (~500-600GB) for 65x coverage of Illumina short-read data for a 2Gbp genome.
If you are concerned about removing indels, then the best way to determine if Racon or Pilon is better for your case is to run both, then predict proteins and run "ideel" (http://www.opiniomics.org/a-simple-test-for-uncorrected-insertions-and-deletions-indels-in-bacterial-genomes/) to see if you have a greater proportion of truncated proteins in the Racon or Pilon polished assembly. With Racon, you should be fine with one iteration of polishing. With Pilon, most people use two iterations, but I warn you that too many polishing iterations is a bad thing as I found a much higher proportion of truncated proteins with a Pilon assembly that was polished 8 rounds versus one that was polished 2 rounds. Your results may vary, but that is what I found with a 2Gbp mammalian genome and 65x Illumina short-read coverage using Pilon.
•
link
modified 2.6 years ago
•
written
2.6 years ago by
jean.elbers • 1.5k
? It looks here like 3a is the best result, and that's the one with no polishing.
3a
is indeed better, but I didn't mention or emphasize that it was an Illumina-only assembly and3b
and3c
are after adding 10x PacBio reads with PBJelly but3b
is with two-rounds of Pilon polishing and3c
is with one-round of Racon polishing. Doing further experimentation, I find that the variant callerCallvariants.sh
from theBBMap/BBTools
suite out-performs one-round of Pilon in terms of RNA-Seq mapping rates.