To my understanding, the main aim in TMM normalization is to account for library size variation between samples of interest. I have a simulated RNA-seq data with equal library sizes for all samples. I ran TMM normalization and I expected to find all normalization factors (from calcNormFactors() function) equal to one. However, the factors vary from 0.4 to 2.4 (with median of 1 of course) and this is not what I expect. Have I misunderstood something here? Another question is can I use TMM normalization for non-binomial values? for instance over TPM values?
Thanks in advance!
Exactly how did you simulate the data. TMM is a robust measure, so if you produced very different distributions of reads then that'd be the cause.
To my knowledge TMM is supposed to correct mostly for composition bias (as well as library size). If you generated samples with different compositions then it's correct that the normalisation factors would vary.
I have nt produced the data myself; but yes the distribution of the reads vary significantly. Can you elaborate a little more what do you mean by robust measure and in what way the distribution affects?