I am currently using nucmer for a genome vs genome alignment. I'm tweaking with these parameters:
-b|breaklen = Set the distance an alignment extension will attempt to extend poor scoring regions before giving up (default 200)
-l|minmatch = Set the minimum length of a single match (default 20)
-c|mincluster = Sets the minimum length of a cluster of matches (default 65)
-g|maxgap = Set the maximum gap between two adjacent matches in a cluster (default 90)
I tried many different parameter sets and, for my data, I think I found a direction to go, but I'd like to know if you agree and/or there are other things to consider.
I am mapping 2 related species, where one is the best candidate parental species for the other, resulting from hybridization.Therefore, I would expect something in the range of 40-60% of the scaffolds to have an extended match, and the rest to not have it as much. Do you agree?
The two genomes are plant genomes, highly repetitive. My plan is to set a high breaklen and a high maxgap in order to account for insertions and rearrangements in general, and to set a mincluster and minmatch higher than the default to retrieve only matches which could correspond to synteny blocks.
EDIT: After 15 test runs I could clearly see that the -l option overrules all the other ones in terms of how much it affects the output.