1. In this post, https://www.biostars.org/p/106127/ It says TCGA RNAseqV2 rsem.genes.normalized_results are calculated by "For gene level estimates you divide all "raw_count" values by the 75th percentile of the column (after removing zeros) and multiply that by 1000."
What are the reasons for multiplying by 1000?
2. To avoid problem with zero counts during log2 transformation, typically people +1 to read count. Is this done before upper quartile normalization step ? I am thinking if we add 1 after normalization, it wouldn't make sense as some normalized read counts can be really small (i.e. 0.0001) , therefore a log2(0.0001) versus log2(1.0001) would be a huge difference.
Or Do people typically add 1 to just the (normalized) counts that are 0 before log2 transformation?