Question: TCGA RNA-seq comparison between tumors
0
gravatar for Les Ander
3.6 years ago by
Les Ander110
United States
Les Ander110 wrote:

Hi,

I have seen issues of normalization of rna-seq data from TCGA raised before and I am not exactly sure they answer my question but I hope someone can comment on the results I am observing.

 I wish to partition BRCA tumors into high levels of gene X and low levels of gene Y

So I partition the tumors into 1. lowX, 2. highX and 3. lowY, 4. highY

I used the unnormalized gene counts. Then I normalized it by the column sum ("library size") so I can do between sample comparison of the same gene. Similar results are true when I use the RSEM normalized gene values. [Please note, I prefer to start with unnormalized counts because I want to be sure exactly what steps I am taking in processing my data.]

Please see figure below:

As expected expression of X is low in lowX group and X is high in highX group.

However, strangely, expression of X is high when Y is high. Similarly, 

expression of Y is high when X is high (or normal).

This suggests that either the tumors in which X is high, everything else is high also (global)

and similarly for Y.

Alternative possibility is that my normalization is not doing a proper job. What I want is to normalize the data such that in each tumor the genes are measured relatively and I think the zscore is the proper way to do this--it would have to be within sample z-score. i.e. I would take all the gene expression within a sample and normalize it to 0 mean and std 1 and then look at the z-scores of genes X and genes Y.

Does this sound reasonable? I appreciate any suggestions or advice. Thank you

 

rna-seq tcga normalization rnaseq • 2.1k views
ADD COMMENTlink modified 3.6 years ago by gc20 • written 3.6 years ago by Les Ander110
1
gravatar for gc
3.6 years ago by
gc20
United States
gc20 wrote:

z-score normalization is not a great option here because it assumes normal data and RNA-Seq counts are not normally distributed.  Quantile normalization, which does not assume any distribution, is more typically used for this type of case.

ADD COMMENTlink written 3.6 years ago by gc20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1143 users visited in the last hour