TCGA data: comparision in normal and tumor samples
1
1
Entering edit mode
7.9 years ago
Mike ★ 1.9k

Hello All,

I am working on TCGA lung cancer data , I want to compare Average expression of a set of gene (my interested set of gene) in normal and tumor samples. I am wondering that the average expression of these gene in normal and tumor samples are very simialr in normalized log2 data , Fig1, (LUAD.uncv2.mRNAseq_RSEM_normalized_log2.txt), but it is different in normalized Z_score data, Fig2, (LUAD.uncv2.mRNAseq_RSEM_Z_Score.txt).

Fig 1, when using (LUAD.uncv2.mRNAseq_RSEM_normalized_log2.txt) data

Fig 1

Fig 2, when using (LUAD.uncv2.mRNAseq_RSEM_Z_Score.txt) data Fig2

PS: x-axis, same order of genes

So, please help me, which input data should be approprite for this type of comparision.

Thank you...

TCGA • 3.3k views
ADD COMMENT
2
Entering edit mode

Why the Z-score for the primary tumor become so small and always nearby zero? how did you pre-process the data? You need share the data and code with dropbox or link so that you can get more suggestions. Usually, majority stuff wil use Figure 1, I think

ADD REPLY
0
Entering edit mode

Thanks Shicheng,

I used pre-processed data from Broad GDAC Firehose (https://gdac.broadinstitute.org/), I didnt normalized data, I just downloaded preprocessed file "LUAD.uncv2.mRNAseq_RSEM_Z_Score.txt, =matrix 576 * 20501)", then extract subset for my gene list (576 * 88), again sub-divided into primary (matrix size= 515 * 88) & normal samples(59 * 88), finally calculate mean expression of each gene in both class separately and plotted.

ADD REPLY
0
Entering edit mode

Although I tried hard to find the file you mentioned, I can not find it in Firehose database. I don't know why. But anyway, maybe I have guessed why you will get this problem. http://gdac.broadinstitute.org/runs/stddata__2016_01_28/data/LUAD/20160128/

ADD REPLY
4
Entering edit mode
7.9 years ago
Shicheng Guo ★ 9.4k

Here, Z_Score means:

Z_Score = (expression in single tumor sample) - (mean expression in all tumor samples ) / (standard deviation of expression in all tumor samples)

That's why the Z-score for cancer group is very small in your Figure 2.

And I think for the calculation of Z score for normal samle is something like this way:

Z_Score = (expression in single normal sample) - (mean expression in all normal samples ) / (standard deviation of expression in all normal samples)

That means this curve only show the fluctuation of the gene expression in that group

I am pretty sure that you should use the data in Figure 1.

ADD COMMENT
0
Entering edit mode

Thanks again ,

Im using preprocessed data from https://gdac.broadinstitute.org/ ( http://firebrowse.org/?cohort=LUAD ) So I should use the normalized log2 data (data in Figure 1)

ADD REPLY
0
Entering edit mode

Hello, I'm using level 3 normalized data from GDAC Firehose, I have question reg. Z-score calculation in tumor sample alone. so, to calculate Z-score of a gene (X) in a tumor sample, how to calculate the mean & std.dev of reference population

Z_Score = (expression in X in tumor sample (s)) - (mean expression of X in all tumor samples(population) / (standard deviation of X's expression in all tumor samples) or

Z_Score = (expression in gene X in tumor sample (s)) - (mean expression of all genes (+20K) in all tumor samples(population) / (standard deviation of all genes (+20K) expression in all tumor samples)

which formula should i consider ?

Thanks, sumithra

ADD REPLY

Login before adding your answer.

Traffic: 2835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6