Correlation between lncRNA and protein coding genes
1
0
Entering edit mode
2.8 years ago
Vasu ▴ 560

I have RNA-Seq data for 300 samples. In which 250 are Tumor and 50 are Normals. I have a matrix with genes as rows and samples as columns.

There are almost 56k genes as rows. Among these genes there are also lncRNAs.

I would like to check the correlation between a specific lncRNA and all other protein coding genes. I want the value of R (correlation co-efficient).

How to do this for one lncRNA vs all protein coding genes in the genome?

RNA-Seq R correlation lncRNA genes • 1.3k views
1
Entering edit mode
2.8 years ago

If your lncRNA is on the ith line in the matrix then in R if count is your matrix

cor <- apply(count,1,function(x){cor(x,count[i,])})


You may choose Pearson or Spearman for the correlation

0
Entering edit mode

so, with this cor how to proceed further? I'm interested in doing spearman correlation.

0
Entering edit mode

Set the method argument. See here: https://www.rdocumentation.org/packages/stats/versions/3.5.1/topics/cor

0
Entering edit mode

Sorry, I'm a bit confused. lets say I have matrix A like below. Ensembl ids as rows and Samples as columns. Using raw counts I used cpm function and converted them to logCPM values like below.

                       Sample1            Sample2           Sample3          Sample4          Sample5
ENSG00000000003.14        17.146506        16.822596        16.781746        16.932891        16.263722
ENSG00000000005.5          6.782761         7.941372         8.520003         8.241359         7.797734
ENSG00000000419.12        16.279996        16.663848        15.908999        14.737590        15.665799
ENSG00000000457.13        15.347626        15.454124        15.211375        15.686339        16.339990
ENSG00000000460.16        15.546598        15.720200        15.331334        15.262918        15.766690


Now, in this I want to check the correlation of ENSG00000000005.5 on all other Ensembl ids.

This is just an example data I'm showing. I have a single lncRNA and around 19k protein coding genes with logCPM values. How to apply the above function on this? And how to plot that with R (correlation coefficient value)?

1
Entering edit mode

my_cor <- apply(my_cpm, 1, function(x){cor(x,count["ENSG00000000005.5",], method = "spearman")})

I don't think plotting the correlation coefs would be particularly revealing; but you can do it if you want

0
Entering edit mode

thanks a lot. I got the correlation coefficient values (R). This could tell whether the lncRNA has strong, moderate or weak correlation with other protein coding genes. But I have a small question what is R square in correlation? What does R square tell?

1
Entering edit mode

Short description. if you have a pair of a variable (X and Y) then value R^2 and r^2 (output of cor) is the same. However, power of R^2 comes into the picture in multiple linear regression problem where multiple variables simultaneously used to predict the response.

Reference :
Excerpt From: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. “An Introduction to Statistical Learning.” iBooks.

0
Entering edit mode