**120**wrote:

Hi

I've two microarrays with same samples(19), one profiled for lncRNAs and the other for mRNA. How can I find/calculate the correlation b/w each lncRNA and mRNA?

Thanks

**20**• written 3.4 years ago by kandoigaurav •

**120**

Question: Correlation between lncRNAs and mRNA from two microarrays?

0

kandoigaurav • **120** wrote:

Hi

I've two microarrays with same samples(19), one profiled for lncRNAs and the other for mRNA. How can I find/calculate the correlation b/w each lncRNA and mRNA?

Thanks

ADD COMMENT
• link
•
modified 9 months ago
by
Biostar ♦♦ **20**
•
written
3.4 years ago by
kandoigaurav • **120**

2

Irsan • **6.8k** wrote:

If for each mRNA/IncRNA only 1 sample is tested, there is no way to calculate correlation between IncRNA/mRNA. Think of a scatter plot with X (e.g. expression of a particular mRNA) and Y-axis (e.g. expression of a particular IncRNA) and 1 point (one sample). You cannot fit/draw a meaningful line through 1 data point.

Edit: So you have 19 samples. I assume you have two matrices called mRNA and incRNA in R. The columns are samples and the rows are features (mRNAs or incRNAs). The column names of mRNA and incRNA objects should be the same and in the same order

```
# first you have to transpose your data
mat1 <- t(mRNA)
mat2 <- t(incRNA)
# then do the magic
result <- apply(mat1, 2, function(col_mat1){
apply(mat2, 2, function(col2, col1) {
cor.test(col2, col1)$estimate # this returns the p-value of the cor.test
}, col1=col_mat1)
})
```

# mRNA1 - incRNA2 correlation is same as incRNA2 - mRNA1 correlation so remove them
result[lower.tri(result)] <- NA
# melt matrix to get in long format
library(reshape)
result <- melt(result)
# and remove the NA-values (the double ones)
result <- na.omit(result)

If you want the p-value for the correlation test you should replace $estimate with $p.value and use p.adjust(...) on your p-values to correct for multiple comparisons.

Thanks for the script. I tried the above for 8 genes with 8 samples, but encountered some problems. The input files look something like this:

Sample1 | Sample2 | Sample3 | Sample4 | Sample5 | Sample6 | Sample7 | Sample8 | |

Gene1 | 0.2 | 0.3 | 0.4 | |||||

Gene2 | ||||||||

Gene3 | ||||||||

Gene4 | ||||||||

Gene5 | ||||||||

Gene6 | ||||||||

Gene7 | ||||||||

Gene8 |

In the output I should get 64 values (8 each for all 8 lncRNA), but I'm getting only 36 values. And the values come in a single column, so which value corresponds to which pair of lncRNA and mRNA?

You should have 2 matrices, not one. And why do you want 64 values back? Correlation between Sample1-mRNA and Sample1-incRNA is the same as the correlation of Sample1-incRNA - Sample1-mRNA. And the result you get is a 3 column table where the first two columns give you the sample names.

Sorry for the confusion, but I've 2matrices, one each for lncRNAs and mRNAs(both similar to the one shown above). I came up with 64 values, because we need correlation between lncRNA genes and mRNA gene expression over the same set of samples.

Adding to what kandoi has said above, I would like to mention, that the final output should be an 8*8 correlation matrix(if in instance there are 8 lncRNAs and 8 mRNAs). So, essentially in this exercise the number of samples do not really matter. The number of correlations that we expect should be the number of lncRNAs * number of mRNAs.

Please log in to add an answer.

Use of this site constitutes acceptance of our User
Agreement
and Privacy
Policy.

Powered by Biostar
version 2.3.0

Traffic: 2168 users visited in the last hour