Edit distance is favored by computer scientists as it is well defined, but it is not always a good standard for biological data. A true hit with a 11bp indel will be considered as false if we only allow 10 differences. Even if a true hit has a 10bp indel, the best hit under the edit distance scoring may be different. Taking RazorS, an edit-distanced based mapper, as the ground truth is flawed to some extent. Also, RazorS is not aware of splicing if I am right, how is it used to evaluate RNA-seq mappers such as star and Segemehl?
More generally, most mapper benchmarks are biased by the view of the designers or the mapper developers. When I want to know the relative performance of two mappers, I tend to read multiple papers that evaluate the two but are not written by the developers of the two mappers. For example, a paper describing a new mapper "A" evaluates older mappers B, C, D and E; a paper describing B evaluates C, E and F. Then we have two relatively unbiased observations of C and E. I see the ensemble of benchmarks as a better benchmark than each individual one and than any benchmarks I made by myself. Segemehl is an old mapper but rarely evaluated by others. This makes it hard for me to understand where it stands.
As Asaf pointed out, mapping is only an intermediate step. These days, I also more like to see how mapping affects downstream analyses such as variant calling, expression, etc. A complication is that a downstream tool is designed with some specific mappers in mind. For example, GATK usually performs better on bwa alignment than on bowtie2 although bowtie2 and bwa are similar in accuracy to some other standards. For another example, a preprint (I forget which) claims that cufflinks works better with tophat2 mapping although star is shown to be more accurate by the same authors. Nonetheless, you will find on RNA-seq variant calling, the GATK team recommend star over tophat2. Sometimes, we are not choosing the best tool, but the chain of tools that work best together.