I working on Illumina reads mapping using Mosaik. From the manuals and literature, I learnt that "hash size" plays a role in mapping efficiency to reference, as well as computational time of run.
What I learnt (correct me if Iam worng): larger hash size -->less efficient mapping and shorter the run time.
But when Iam trying to standardise the Hash size for mapping to Drosophila genome. With no mismatches allowed and complete (100%) alignment of reads to reference. I am getting way more number of reads mapped to the reference at larger (hs-17) Hash size (compared to hs-16 to hs 11) and It runs many times faster.
I am little sceptical with this observation. Does "not allowing" any mismatches increases mapping efficiency? and also does number of hash positions per seed affect? I appreciate some light on this topic.
Thanks @mrawlins for the clarification. Hs-18 onwards no. of hits started decreasing. And also in my case when I select "single" hash per seed, no. of hits were more compared to "multi" (9) or "all".