Question: Correlation between lncRNAs and mRNA from two microarrays?
0
4.2 years ago by
kandoigaurav120
United States
kandoigaurav120 wrote:

Hi

I've two microarrays with same samples(19), one profiled for lncRNAs and the other for mRNA. How can I find/calculate the correlation b/w each lncRNA and mRNA?

Thanks

modified 9 months ago by Biostar ♦♦ 20 • written 4.2 years ago by kandoigaurav120
2
4.2 years ago by
Irsan7.0k
Amsterdam
Irsan7.0k wrote:

If for each mRNA/IncRNA only 1 sample is tested, there is no way to calculate correlation between IncRNA/mRNA. Think of a scatter plot with X (e.g. expression of a particular mRNA) and Y-axis (e.g. expression of a particular IncRNA) and 1 point (one sample). You cannot fit/draw a meaningful line through 1 data point.

Edit: So you have 19 samples. I assume you have two matrices called mRNA and incRNA in R. The columns are samples and the rows are features (mRNAs or incRNAs). The column names of mRNA and incRNA objects should be the same and in the same order

``````# first you have to transpose your data
mat1 <- t(mRNA)
mat2 <- t(incRNA)
# then do the magic
result <- apply(mat1, 2, function(col_mat1){
apply(mat2, 2, function(col2, col1) {
cor.test(col2, col1)\$estimate # this returns the p-value of the cor.test
}, col1=col_mat1)
})```
# mRNA1 - incRNA2 correlation is same as incRNA2 - mRNA1 correlation so remove them
result[lower.tri(result)] <- NA
# melt matrix to get in long format
library(reshape)
result <- melt(result)
# and remove the NA-values (the double ones)
result <- na.omit(result)```

If you want the p-value for the correlation test you should replace \$estimate with \$p.value and use p.adjust(...) on your p-values to correct for multiple comparisons.

I've 19 samples for each Microarray.

Thanks for the script. I tried the above for 8 genes with 8 samples, but encountered some problems. The input files look something like this:

 Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8 Gene1 0.2 0.3 0.4 Gene2 Gene3 Gene4 Gene5 Gene6 Gene7 Gene8

In the output I should get 64 values (8 each for all 8 lncRNA), but I'm getting only 36 values. And the values come in a single column, so which value corresponds to which pair of lncRNA and mRNA?

You should have 2 matrices, not one. And why do you want 64 values back? Correlation between Sample1-mRNA and Sample1-incRNA is the same as the correlation of Sample1-incRNA - Sample1-mRNA. And the result you get is a 3 column table where the first two columns give you the sample names.

Sorry for the confusion, but I've 2matrices, one each for lncRNAs and mRNAs(both similar to the one shown above). I came up with 64 values, because we need correlation between lncRNA genes and mRNA gene expression over the same set of samples.

Adding to what kandoi has said above, I would like to mention, that the final output should be an 8*8 correlation matrix(if in instance there are 8 lncRNAs and 8 mRNAs). So, essentially in this exercise the number of samples do not really matter. The number of correlations that we expect should be the number of lncRNAs * number of mRNAs.

1
If you dont execute the code right after the apply loops you have what you ask for

Thanks a lot. That worked for us.

You're welcome

Hi Irsan I have

8 samples 4 regions up reslgulated and 4 down regulated mrna for twp groups Male and Female same data for lncRNA?

I don,t know what should i consider samples and how i can use my data to analyze this ?