TCGA RNAseqV2 upper quartile normalization with x1000 adjustment factor

1

Entering edit mode

8.5 years ago

CHANG ▴ 40

In this post, it says TCGA RNAseqV2 rsem.genes.normalized_results are calculated by "For gene level estimates you divide all "raw_count" values by the 75th percentile of the column (after removing zeros) and multiply that by 1000." What are the reasons for multiplying by 1000?
To avoid problem with zero counts during log2 transformation, typically people +1 to read count. Is this done before upper quartile normalization step? I am thinking if we add 1 after normalization, it wouldn't make sense as some normalized read counts can be really small (i.e. 0.0001), therefore a log2(0.0001) versus log2(1.0001) would be a huge difference.

Or Do people typically add 1 to just the (normalized) counts that are 0 before log2 transformation?

RNA-Seq • 3.7k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.5 years ago by CHANG ▴ 40

Login before adding your answer.