Question

TCGA RNA-seq comparison between tumors

0

Entering edit mode

8.7 years ago

Les Ander ▴ 110

Hi,

I have seen issues of normalization of rna-seq data from TCGA raised before and I am not exactly sure they answer my question but I hope someone can comment on the results I am observing.

I wish to partition BRCA tumors into high levels of gene X and low levels of gene Y

So I partition the tumors into 1. lowX, 2. highX and 3. lowY, 4. highY

I used the unnormalized gene counts. Then I normalized it by the column sum ("library size") so I can do between sample comparison of the same gene. Similar results are true when I use the RSEM normalized gene values. [Please note, I prefer to start with unnormalized counts because I want to be sure exactly what steps I am taking in processing my data.]

Please see figure below:

As expected expression of X is low in lowX group and X is high in highX group.

However, strangely, expression of X is high when Y is high. Similarly, expression of Y is high when X is high (or normal).

This suggests that either the tumors in which X is high, everything else is high also (global) and similarly for Y.

Alternative possibility is that my normalization is not doing a proper job. What I want is to normalize the data such that in each tumor the genes are measured relatively and I think the zscore is the proper way to do this--it would have to be within sample z-score. i.e. I would take all the gene expression within a sample and normalize it to 0 mean and std 1 and then look at the z-scores of genes X and genes Y.

Does this sound reasonable? I appreciate any suggestions or advice. Thank you

image: screenshot

RNA-Seq normalization TCGA • 3.2k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.7 years ago by Les Ander ▴ 110

Ram · Answer 1 · 2015-09-01

1

Entering edit mode

8.7 years ago

gc ▴ 20

z-score normalization is not a great option here because it assumes normal data and RNA-Seq counts are not normally distributed. Quantile normalization, which does not assume any distribution, is more typically used for this type of case.

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.7 years ago by gc ▴ 20