Hi, community! I am de novo assembling Nanopore long reads and I am comparing my draft genome assembly against the online available reference genome. First, I want to give you some details about the inputs. The reference genome used a hybrid method that compromised Illumina and PacBio; they assembled the short reads and used the long reads (~30x coverage) to gap-filling. In my assembly, I could find complete chromosomes but nothing on the reference genome. The genome annotation of the reference genome could annotate more protein-coding genes than my own assembly. Both assemblies are from the same species but different strains and they are different in genome size.
When I ran nucmer from MUMmer with the option -maxmatch and delta-filter I got an average identity of 93%. How is this possible? I find it difficult to understand because:
1) My assembly had ~47x coverage which according to the literature I needed 70x coverage to overcome the systematic errors of Nanopore so my assembly has errors even though I did a lot of error-correction, consensus and polishing steps.
2) With the Nanopore assembly, I could span more repetitive regions than the reference genome.
So, In general, there are different reasons why I would have expected a little bit less of the identity percentage.
Let me know if I made myself clear. Thank you in advance for your help.