Question: Gene Expression Correlation
1
4.5 years ago by
United States
writersblog0240 wrote:

I want to correlate the expression of gene-setA with SetB using Pearson's correlation in R. Can someone help with the script?

R • 5.6k views
modified 3.7 years ago by Palgrave20 • written 4.5 years ago by writersblog0240

Yes, but that correlated values of each row. eg: SetA-row1 vs SetB-row1..and so on...It did not create a matrix with each gene of set A correlation with all genes of SetB

1

not sure what you are trying to achieve, do you want to correlate the same gene in two conditions? one gene against all the others in the other conditions? everything against everything? how do your data look? could you post a sample of the data so we can better help you out?

e.g.:

 SetA Gene Expression a1 142 a2 111 a3 5 SetB Gene Expression b1 200 b2 4 b3 67

I   want to create a aXb correlation matrix

 a1 a2 a3 b1 cor(a1,b1) cor(a2,b1) cor(a3,b1) b2 cor(a1,b2) cor(a2,b2) cor(a3,b2) b3 cor(a1,b3) cor(a2,b3) cor(a3,b3) Also, What if the 2 gene sets are of unequal sizes

2

You can't take the correlation of a pair of samples where there is only one observation, because the two samples you are comparing have no standard deviation, and the definition of the Pearson's correlation coefficient requires the calculation of a standard deviation or distance from the mean, which has no meaning for one observation: http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

You need at least three observations of your two variables a and b, and both vectors usually need to be of same size to do a Pearson correlation test with `cor.test()`. You can tell the correlation test how to handle `NA` values with `na.action = "na.exclude"` (i.e., where you don't have a matching observation in one of the two paired samples, you exclude data from both vectors).

I suggest reading about correlation and reading the relevant R documentation. What you're asking for doesn't seem to make much sense as described.

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Alex Reynolds29k

I got an error when running this function:

> for i in 1 to 100 {         # go through rows in first matrix
Error: unexpected symbol in "for i"
> for i in 1 to 100 {         # go through rows in first matrix
Error: unexpected symbol in "for i"

1

TriS wrote a pseudo code, which could be functional or just the guideline.

You should read a short tutorial to get yourself acquainted to the loops in R
A Tutorial on Loops in R - Usage and Alternatives

# this is the syntax for R
`for(i in 1:100 ){print (i)}`

2
4.5 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Type in `?cor.test` within R, and take a look at the provided example at the bottom of the documentation.

ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by Alex Reynolds29k
2
4.5 years ago by
TriS3.9k
United States, Buffalo
TriS3.9k wrote:

yeah so...there are a few things you want to keep in mind

1. I hate saying it but use Google first for a quick search
2. what's your script so far? how does it look? it's easier to help you if you already have something
3. if unequal size you either use only the matching samples or use simulation of missing data to estimate the correlation coefficient by repeating the simulation enough times...but it seems that pairwise deletion is easier and leads to pretty much the same results

now, as starting point I can give you a pseudocode which can look like:

```mat1 <- matrix 100x10
mat2 <- matrix 100x10
results <- c()
for i in 1 to 100 {         # go through rows in first matrix
for j i in 1 to 100 {         # go through rows in second matrix
correlation <- calculateCorrelation(mat1[i,], mat2[j,])         # get correlation
temp <- c(results, temp) # create a vector with all the correlation of one row with all the rows in the second matrix
}
results <- rbind(results, temp)         # create the 100x100 matrix
}```

this will give you what you want and if matrix 1 and matrix 2 are different size it should be trivial to tweak