I am running Kallisto quant with the following command
kallisto quant -i gencodeV27 -o sample1 --bias --rf-stranded --genomebam --gtf gencode.v27lift37.annotation.gtf --chromosomes chromSize.txt sample1_1.fq sample1_2.fq
And one particular summary line below caught my attention which says (correct me if I am wrong) that there is a total of 71,414,042 paired reads in my fastq files
[quant] processed 71,414,042 reads, 65,645,163 reads pseudoaligned
The processed reads number does not tally against the reads count I get from trim_galore. I am using trimmed fastq files for this testing and the summary reported number from trim_galore after processing is found below, which means that I have only 65,853,559 (= 66089021 - 235462) reads in my input fastq file for kallisto.
RUN STATISTICS FOR INPUT FILE: sample1.fastq.gz ============================================= 66089021 sequences processed in total Total number of sequences analysed for the sequence pair length validation: 66089021 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 235462 (0.36%)
Any comment on my interpretation above would be much appreciated. Or TLDR, why is Kallisto processing more reads than what is in the input fastq?