A bit of background: I have started exploring Salmon as a RNA quantification tool for miRNA-Seq datasets.
My previous experiences with Salmon quantification of mRNA-Seq suggests it is more accurate than more traditional align and count (AaC) strategies, but noticed some rare instances where Salmon estimates many magnitudes more reads mapping to a gene than my AaC methods (STAR+VERSE).
When I quantify miRNA-Seq datasets against references built using miRBase with either mature miRNA or miRNA primary transcript sequences I notice the same trend, namely many (ususally more lowly abundant) transcripts have much higher estimated read abundance by Salmon than AaC. More abundant miRNA species are much more consistent between methods.
Since this is "live" data, I really have no idea how to assess which method is more accurate, so the best I can do is optimize my Salmon parameters for miRNA-Seq data and then follow up later in wet lab experiments.
I think I understand the Salmon method on a high level, but not deeply enough that I know how the various parts of the inference algorithm may influence the results specifically for miRNA-Seq data.
My questions are:
Can anyone give any advice or recommendations on what parameters for both index creation and quasi-mapping analysis? Other than setting the k-mer size sufficiently low (I used 11) for creating the index?
Are there any considerations I should take when interpreting the results of miRNA-Seq vis a vis mRNA-Seq read estimates?