Hi I'm new to analyzing RNA-seq data. I started with using hisat2 to align RNA-seq reads. I think my main goal is to do differential gene expression analysis comparing multiple control samples vs case samples.
I run through a test run with hisat2 with basically these options: --dta-cufflinks --rna-strandness. I realized that the number of alignments in the bam file is more than the number of reads in the original fastq file. Puzzled by this, I searched around and realize that there is this option -k with a default value of 5. So there could be up to 5 alignments of one read.
-k <int> It searches for at most <int> distinct, primary alignments for each read. Primary alignments mean alignments whose alignment score is equal or higher than any other alignments.
I think this is the reasons for the number of alignments being more than number of reads.
So I'm curious what would be the ideal -k to use and how this option impact downstream analysis with gene counts etc?