5.5 years ago by
France, Lille, CNRS
In all generality, you want to set an abundance threshold X so that every correct k-mers appear X times or more in the dataset, and not too many erroneous k-mers are seen X times or more. When you take a look at the abundance histogram (generated by Kmergenie or a k-mer counter), a reasonable abundance threshold is near the first "valley" (local minimum) in this histogram.
For high-coverage datasets, the abundance threshold should be high (I can't give a specific number as it depends highly on the dataset but it's generally within the range 5-20). And for low-coverage datasets, 2 or 3 are generally good.
Kmergenie offers an experimental feature that determines an abundance parameter for you. It's not in the HTML report yet, but you can see it in the command line output. Give it a try! I've had good results with it so far.