I am working on gene co-expression analysis by using BRCA dataset in TCGA. In order to get a backgroud dataset, I randomly pick two genes and calculate the Pearson correlation coefficient for 4000 times.
Each time I randomly pick two genes and get a data matrix. There are two columns which represent two genes. There are many rows which represent samples. Each row have two values which represent expression value of gene A and B in one specific sample (RNA-seq data, RPKM). I calculate the Pearson correlation coefficient in R using cor.test. I do above calculations for 4000 times.
From my understanding, the frequency distribution of Pearson correlation coefficient should be half negative and half positve. However, most of Pearson correlation coefficients are positive. The percentage of negative Pearson correlation coefficient is less than 10%.
What is wrong to my calculation? Could you please give me some suggestions?