Asking for a second hand opinion on my reasoning here in regards to not using RPKM to normalize between my samples. Any feedback would be highly appreciated.
Reasoning for not normalizing for transcript length and total mapped reads per sample (RPKM): I do not analyze differential gene expression within samples. I analyze mapped reads to the same target (e.g. virus genome sequence or virus gene sequence) between samples. Thus I do not have to normalize for transcript length (since they are of the same length). Regarding “total mapped reads per sample”, I expect these to differ between samples due to inherent characteristics of the samples. For example, from a sample taken from a highly infected individual I would expect there to be a higher level of siRNA specific reads compared to a sample from a uninfected individual. Thus, since I measure the differences between different treatment groups I do not want to normalize for variation in total reads available. NB! An alternative option would be to normalize for overall total reads collected in the sequencing procedure in case some samples went through deeper sequencing.
Thanks for the feedback. Your opinion in regards to using normalization to total reads sounds reasonable to me. I guess I am, being a newbie in the field, not aware of the different pitfalls in RNA-seq. In retrospect, a spike in would have been a good idea and suitable to detect technical variations between samples in the library prep and sequencing procedure.
My results stems from 10 RNA samples (pools of 5 biological replicates) sampled from different treatment groups at various time points. I know I should have used technical replicates in the library prep and sequencing procedure, but cost was an issue here. However, the results I have gathered so far paint a pretty picture being very consistent with what I expect and in accordance with several other types analyses. Thus I am not sure if DESeq will help. Wouldn't this method be more useful when applied to check for variance between technical replicates?
I see. The DESeq workflow can be split in two steps :
Step 2 is best with replicates (although it works without replicates*). However, the normalization step is independent of replicates so you can still normalize your data using DESeq then use the normalized counts for the next steps of your analysis.
* See the DESeq2 manual section 5.8: