Asking for a second hand opinion on my reasoning here in regards to not using RPKM to normalize between my samples. Any feedback would be highly appreciated.
Reasoning for not normalizing for transcript length and total mapped reads per sample (RPKM): I do not analyze differential gene expression within samples. I analyze mapped reads to the same target (e.g. virus genome sequence or virus gene sequence) between samples. Thus I do not have to normalize for transcript length (since they are of the same length). Regarding “total mapped reads per sample”, I expect these to differ between samples due to inherent characteristics of the samples. For example, from a sample taken from a highly infected individual I would expect there to be a higher level of siRNA specific reads compared to a sample from a uninfected individual. Thus, since I measure the differences between different treatment groups I do not want to normalize for variation in total reads available. NB! An alternative option would be to normalize for overall total reads collected in the sequencing procedure in case some samples went through deeper sequencing.