Question: Correlation between lncRNA and protein coding genes
0
gravatar for Vasu
17 months ago by
Vasu420
Vasu420 wrote:

I have RNA-Seq data for 300 samples. In which 250 are Tumor and 50 are Normals. I have a matrix with genes as rows and samples as columns.

There are almost 56k genes as rows. Among these genes there are also lncRNAs.

I would like to check the correlation between a specific lncRNA and all other protein coding genes. I want the value of R (correlation co-efficient).

How to do this for one lncRNA vs all protein coding genes in the genome?

lncrna rna-seq correlation genes R • 850 views
ADD COMMENTlink modified 17 months ago by Nicolas Rosewick8.7k • written 17 months ago by Vasu420
1
gravatar for Nicolas Rosewick
17 months ago by
Belgium, Brussels
Nicolas Rosewick8.7k wrote:

If your lncRNA is on the ith line in the matrix then in R if count is your matrix

cor <- apply(count,1,function(x){cor(x,count[i,])})

You may choose Pearson or Spearman for the correlation

ADD COMMENTlink written 17 months ago by Nicolas Rosewick8.7k

so, with this cor how to proceed further? I'm interested in doing spearman correlation.

ADD REPLYlink written 17 months ago by Vasu420

Set the method argument. See here: https://www.rdocumentation.org/packages/stats/versions/3.5.1/topics/cor

ADD REPLYlink modified 17 months ago • written 17 months ago by russhh5.2k

Sorry, I'm a bit confused. lets say I have matrix A like below. Ensembl ids as rows and Samples as columns. Using raw counts I used cpm function and converted them to logCPM values like below.

                       Sample1            Sample2           Sample3          Sample4          Sample5
ENSG00000000003.14        17.146506        16.822596        16.781746        16.932891        16.263722
ENSG00000000005.5          6.782761         7.941372         8.520003         8.241359         7.797734
ENSG00000000419.12        16.279996        16.663848        15.908999        14.737590        15.665799
ENSG00000000457.13        15.347626        15.454124        15.211375        15.686339        16.339990
ENSG00000000460.16        15.546598        15.720200        15.331334        15.262918        15.766690

Now, in this I want to check the correlation of ENSG00000000005.5 on all other Ensembl ids.

This is just an example data I'm showing. I have a single lncRNA and around 19k protein coding genes with logCPM values. How to apply the above function on this? And how to plot that with R (correlation coefficient value)?

ADD REPLYlink modified 17 months ago • written 17 months ago by Vasu420
1

my_cor <- apply(my_cpm, 1, function(x){cor(x,count["ENSG00000000005.5",], method = "spearman")})

I don't think plotting the correlation coefs would be particularly revealing; but you can do it if you want

ADD REPLYlink written 17 months ago by russhh5.2k

thanks a lot. I got the correlation coefficient values (R). This could tell whether the lncRNA has strong, moderate or weak correlation with other protein coding genes. But I have a small question what is R square in correlation? What does R square tell?

ADD REPLYlink written 17 months ago by Vasu420
1

Short description. if you have a pair of a variable (X and Y) then value R^2 and r^2 (output of cor) is the same. However, power of R^2 comes into the picture in multiple linear regression problem where multiple variables simultaneously used to predict the response.

Reference :
Excerpt From: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. “An Introduction to Statistical Learning.” iBooks.

ADD REPLYlink modified 17 months ago • written 17 months ago by Chirag Parsania1.7k

You should maybe start by reading about statistics before going further into your analysis 😉

ADD REPLYlink written 17 months ago by Nicolas Rosewick8.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1830 users visited in the last hour