Question: Best set of nucmer mapping parameters for genome vs genome alignment
gravatar for Macspider
23 months ago by
Vienna - BOKU
Macspider2.8k wrote:

Hi all,

I am currently using nucmer for a genome vs genome alignment. I'm tweaking with these parameters:

  • -b|breaklen = Set the distance an alignment extension will attempt to extend poor scoring regions before giving up (default 200)

  • -l|minmatch = Set the minimum length of a single match (default 20)

  • -c|mincluster = Sets the minimum length of a cluster of matches (default 65)

  • -g|maxgap = Set the maximum gap between two adjacent matches in a cluster (default 90)

I tried many different parameter sets and, for my data, I think I found a direction to go, but I'd like to know if you agree and/or there are other things to consider.

I am mapping 2 related species, where one is the best candidate parental species for the other, resulting from hybridization.Therefore, I would expect something in the range of 40-60% of the scaffolds to have an extended match, and the rest to not have it as much. Do you agree?

The two genomes are plant genomes, highly repetitive. My plan is to set a high breaklen and a high maxgap in order to account for insertions and rearrangements in general, and to set a mincluster and minmatch higher than the default to retrieve only matches which could correspond to synteny blocks.

Any suggestion?

EDIT: After 15 test runs I could clearly see that the -l option overrules all the other ones in terms of how much it affects the output.

ADD COMMENTlink modified 22 months ago • written 23 months ago by Macspider2.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 695 users visited in the last hour