Question: question about co-expression in TCGA
0
gravatar for tujuchuanli
9 months ago by
tujuchuanli40
tujuchuanli40 wrote:

Hi, all

I am working on gene co-expression analysis by using BRCA dataset in TCGA. In order to get a backgroud dataset, I randomly pick two genes and calculate the Pearson correlation coefficient for 4000 times.

Each time I randomly pick two genes and get a data matrix. There are two columns which represent two genes. There are many rows which represent samples. Each row have two values which represent expression value of gene A and B in one specific sample (RNA-seq data, RPKM). I calculate the Pearson correlation coefficient in R using cor.test. I do above calculations for 4000 times.

From my understanding, the frequency distribution of Pearson correlation coefficient should be half negative and half positve. However, most of Pearson correlation coefficients are positive. The percentage of negative Pearson correlation coefficient is less than 10%.

What is wrong to my calculation? Could you please give me some suggestions?

Thanks

co-expression • 432 views
ADD COMMENTlink modified 9 months ago • written 9 months ago by tujuchuanli40

Thanks, Kevin. I download the RNA-seq data of hg38 directly from TCGA website which do not have RSEM version to measure gene expression level. I am planing to download hg19 version to test what you say. Thanks again

ADD REPLYlink written 9 months ago by tujuchuanli40

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. This comment belongs under @Kevin's answer.

ADD REPLYlink written 9 months ago by genomax65k
0
gravatar for Kevin Blighe
9 months ago by
Kevin Blighe41k
Guy's Hospital, London
Kevin Blighe41k wrote:

What you find is likely explained by the fact that you're using RPKM data. As mentioned in my other comment ( C: question about identifying differential expressed genes in TCGA ), it would be better to obtain RSEM counts (now available for majority of, if not all, TCGA datasets) and to process these with a 'modern' normalisation strategy, and to then conduct Pearson correlation on the transformed counts.

Kevin

ADD COMMENTlink modified 9 months ago • written 9 months ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1348 users visited in the last hour