Just to comment a bit on Devon's (accurate & ideal) response. We've actually been looking into the fine-grained particulars of this and have a paper under submission that explores the nature of the small differences in some detail. There are at least two potential reasons for the differences one observes in practice. In Salmon, by running in alignment-based mode, we can set aside the alignment vs. "mapping" difference and just look at how accuracy (as judged by the useful but less-than-perfect metric of accuracy on data simulated using RSEM-derived quantification) varies between methods.
One hypothesis that withstood our testing is that the difference can be explained by the fact that certain methods factorize the likelihood that is being optimized (i.e., when running their EM / VBEM procedures, they do not consider each fragment independently, but group certain fragments together for the purposes of quantification). We were also able to derive new factorizations with very similar performance (in terms of time / memory requirements) to the ones currently used. These new factorizations, nonetheless, don't exhibit easily measurable differences from methods, like RSEM, that optimize an un-factorized (or full) likelihood. That is, there are groupings that factorize the likelihood in a different (and more data-dependent) way, that are more faithful to the un-factorized likelihood. I'll note here that there are also small differences attributable to traditional alignment vs. fast mapping strategies. We are investigating these further as well, though one must be particularly careful here not to bias one's validation considering how simulated data is often generated using a model that incorporates (and encodes important information) in alignment characteristics.
Regarding (2); the reasoning behind this is likely more historical than anything else. RSEM incorporates a model of alignment that simply doesn't model insertions or deletions, though there is nothing inherent about the method itself that precludes this. For example, Salmon (in alignment-based mode), Tigar, eXpress, BitSeq and many other tools support insertions and deletions in the alignments. RSEM will simply not process samples with indels in the alignments (that's why, if you use the builtin wrapper scripts to process the reads, RSEM will run Bowtie2 in a manner that disallows insertions or deletions in the mappings). There has not, to my knowledge, been a detailed study on the effect this has on accuracy in different cases. In most common cases, one would expect indels to be rather rare and, therefore, the effect of ignoring them to be rather small. On the other hand, it certainly seems possible that, if important (unknown) indels exist, allowing reads to align / map over them could improve quantification. Existing alignment-free methods will map (and account in quantification) for reads that exhibit indels with respect to the reference.