Some thoughts related to your questions:
If one will probably map the reads in order to visualize downstream via tools like IGV, is it better to use the alignment based mapping so that Salmon agrees with the alignment output?
Alignment-based Salmon deals with mappings of the reads against the transcriptome, not the genome. As such, there aren't many tools to visualize alignments directly against the transcriptome. Moreover, a visualization of the multi-mapping alignments would represent how they pile up before the multi-mapping is resolved by Salmon. Generally, there aren't really good tools to do this type of visualization (though we are working on some tools / features along these lines). That said, if you map with STAR, it may be possible to have STAR write out the alignments in both the genomic & transcriptomic coordinates --- the former file could be visualized with e.g. IGV while the latter could be provided to Salmon as input. If you're planning to do alignment with STAR anyway, it's reasonable to provide the transcriptomic alignments to Salmon as input.
Our experiments involve looking at ribosome profiling data when translation is disrupted - resulting in low read counts. We also have the corresponding RNA-Seq data. Should one use gcbias and seqbias for one/both data sets? My idea was to not use either option to avoid introducing bias into our already limited ribosome profiling data set.
I'm not quite sure I understand the use-case here. Salmon is designed for processing RNA-seq data, not ribosome profiling data. While the multi-mapping problem is present in both types of data, there are fundamentally different modeling assumptions at play in ribosome profiling, as read generation is affected both by the underlying transcript abundance and by the dynamics / speed of the translation process. As far as I am aware, there isn't a method or tool currently available that fully models ribosome profiling at the transcript level.
Could someone provide a better explanation for fldMean and fldSD? We are using Illumina HiSeq 50 bp single end reads. Would changing the default settings be more useful? I had a hard time understanding the documentation / comments in Source Code
With paired-end data, the empirical distribution of fragment lengths can be inferred directly from the data. However, this is not possible with single-end data. Thus, for single-end data, you need to provide estimates of what the fragment length distribution in your experiment looked like. These can be inferred from e.g. the bioanalyzer scans that accompanied the sequencing experiment. For single-end data, the fldMean and fldSD parameters tell Salmon what assumed fragment length distribution should be used, this distribution, in turn, will affect how Salmon computes the effective lengths of these transcripts (which, in turn, can affect the estimated abundances).
While I believe I generally understand the kmer parameter in alignment free mapping, I don't quite understand how it works for reads with lengths < kmer length. I ask because our ribosomal fragments range from 12-36 nt but I get maximum mapping percentage at k=17. Are the smaller fragments simply excluded? (I forgot to check if # unmapped = total reads with lengths < 17, I will check and edit).
Reads of length < k will not be mapped. Salmon will actually print to it's log (by default both to stderr and to the log file) the fraction of reads that were observed that were shorter than k. Mapping for these reads will not even be attempted. However, please see above regarding the caveats regarding the processing of ribosomal profiling data.