My question is whether it is more appropriate to feed Salmon a single concatenated fastq file or multiple sequencer- and read-length-specific fastq files when the reads in the fastq file (or files) have been generated at different times with different sequencing lengths for a given sample. The Salmon documentation is clear enough regarding Salmon's ability to accommodate concatenated fastq files from a single library, but I'm concerned about the effect of varying read lengths on the quantification process.
My motivation for this question is that I have a dataset generated over several years wherein certain samples with insufficient read depth were sequenced multiple times, and the different sets of reads were concatenated into single, sample-specific fastq files. I could load the single, concatenated fastq file for a given sample into Salmon, or I could decompose the fastq file for a given sample into multiple sequencer- and read-length-specific fastq files, and then load them separately into Salmon. (I could also decompose them and then load them together into Salmon using the referenced multiple read file approach, but I will resist that temptation.) My concern with the first (single file) approach is that Salmon would apply a quantification scheme to all reads that is only applicable to a subset of the reads. My concern with the second (multiple file) approach is the converse; multiple schemes will be applied when a single scheme would be more appropriate.
If I use the first (single file) approach, I think I should at least shuffle the reads (per read order section). If I use the second (multiple file) approach, should I use the same or different indices (with different k values most appropriate for read length)? I am using Salmon in non-alignment-based mode with a quasi-mapping-based index.