Question: Gene-gene Pearson Correlation
gravatar for pixie@bioinfo
6.4 years ago by
Université Paris, Saclay
pixie@bioinfo1.4k wrote:

I am stuck with a very simple problem. I want to build a Pearson correlation matrix for my microarray dataset. My .cvs file consists of normalized, log-transformed expression values of 18k genes across 36 samples. I want to find the gene-gene Pearson correlation from this matrix using R package. After that, I want to transform the matrix to the form of an edge-list with genes in the first two columns and the value of the correlation in the last column. I was trying out the cor() function in R, but I guess there is some issue with numeric/character values because of which it gives me the error 'x has to be numeric'. Kindly give some suggestions as to what way I can read in the file and transform the matrix.


Gene sample1 sample2 sample3
A 10 50 78
B 50 45 55
C 70 56 44


microarray R • 12k views
ADD COMMENTlink modified 6.4 years ago by Istvan Albert ♦♦ 84k • written 6.4 years ago by pixie@bioinfo1.4k

I think the cor() function is trying to also use the geneName (column 1) to infer correlations, have you tried cor(myData[,-1]) to calculate it? Also have a look at the vigniette of the {stats} package!

ADD REPLYlink modified 9 months ago by RamRS30k • written 6.4 years ago by Phil S.660

Thanks..just a I have to find correlations between A&B..B&C..etc...I read somewhere that R works column I used cor(t(myData[,-1])) transform the I doing right ? Another issue is this way I am loosing all the row and column headings ..

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by pixie@bioinfo1.4k

to transpose your data is, in general, just fine. Please have in mind the comment from Istvan Albert.

ADD REPLYlink modified 8 months ago by RamRS30k • written 6.4 years ago by Phil S.660
gravatar for umer.zeeshan.ijaz
6.4 years ago by
Glasgow, UK
umer.zeeshan.ijaz1.8k wrote:

Use my script taxo_bivariate_plot.R. It uses cor() as suggested by Phil S. Usage information is as follows:

$ Rscript taxo_bivariate_plot.R --help
Usage: taxo_bivariate_plot.R [options] file
        CSV file
        Output path
        Font size [default 1.2]
        Width of jpeg files [default 800]
        Height of jpeg files [default 800]
        Correlation to use: 1=pearson, 2=spearman, 3=kendall [default 1]
        Mode: TRUE=R mode, FALSE=Q mode [default FALSE]
    -h, --help
        Show this help message and exit

This script generates bivariate plots with histograms on the diagonals, scatter plots with smooth curves below the diagonals and correlations with significance levels above diagonals. Data file has the following organization:

         Var_1 Var_2 Var_3 .. Var_R

For example,

$head ENV_pitlatrine.csv
$ Rscript taxo_bivariate_plot.R --ifile=ENV_pitlatrine.csv

Will generate the following image:

This Rscript is back-end of my TAXAenv website. You can use --rmode to transpose your matrix and change the script accordingly to meet your needs.

Best Wishes,

ADD COMMENTlink modified 9 months ago by RamRS30k • written 6.4 years ago by umer.zeeshan.ijaz1.8k

that is a nice solution there

ADD REPLYlink written 6.4 years ago by Istvan Albert ♦♦ 84k

Thanks very much..will try this out!

ADD REPLYlink written 6.4 years ago by pixie@bioinfo1.4k
gravatar for Istvan Albert
6.4 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

It is not clear what you mean by gene-gene correlation. If you were to compare each gene to every other other one there will be about N*N correlations to compute. Then you'll end up with a large list of values that again need to be processed to make any sense of.

If you want to select genes that are most similar to one another then hierarchal clustering (with say correlation as its metric) is the way to go.

ADD COMMENTlink written 6.4 years ago by Istvan Albert ♦♦ 84k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 591 users visited in the last hour