My group just published this paper, I think it will be of your interest.
Glusman G, Caballero J, Robinson M, Kutlu B, Hood L (2013) Optimal Scaling of Digital Transcriptomes. PLoS ONE 8(11): e77885. doi:10.1371/journal.pone.0077885
Very thorough analysis. I am still working my way through it. One comment I have is about your choice of using number of uniform genes as a metric of normalization success. Aren't you making the assumption that differentially expressed genes will not be uniformly different? I guess for this metric to fail, you would need samples where the majority of genes are differentially expressed and a majority of the differentially expressed genes are uniformly different. So perhaps it is a safe assumption to make? My second question is whether there are publicly available data with spike-ins you could have used as the gold standard?
The data you showed where scaling by small amount of housekeeping genes yield bad normalization results is interesting. Seems like it would have significant implications on a lot of qPCR data that normalizes using housekeeping genes.
Thanks for your comments, The assumtion is a large portion of the genes are uniformely expressed, after scaling you can use your prefer method for DEG detection.
Regarding your second question, my problem with spike-ins it's the few sequences that you test, but perhaps we can try to see the methods perfomance with such data sets.
And yes, there are previous publications with critics to house-keeping genes, some going further saying that there are not such genes.
Thanks for your comments, The assumtion is a large portion of the genes are uniformely expressed, after scaling you can use your prefer method for DEG detection.
Regarding your second question, my problem with spike-ins it's the few sequences that you test, but perhaps we can try to see the methods perfomance with such data sets.
And yes, there are previous publications with critics to house-keeping genes, some going further saying that there are not such genes.