Question: Correlation between lncRNA and protein coding genes
0
gravatar for Vasu
7 months ago by
Vasu330
Vasu330 wrote:

I have RNA-Seq data for 300 samples. In which 250 are Tumor and 50 are Normals. I have a matrix with genes as rows and samples as columns.

There are almost 56k genes as rows. Among these genes there are also lncRNAs.

I would like to check the correlation between a specific lncRNA and all other protein coding genes. I want the value of R (correlation co-efficient).

How to do this for one lncRNA vs all protein coding genes in the genome?

lncrna rna-seq correlation genes R • 561 views
ADD COMMENTlink modified 7 months ago by Nicolas Rosewick7.7k • written 7 months ago by Vasu330
1
gravatar for Nicolas Rosewick
7 months ago by
Belgium, Brussels
Nicolas Rosewick7.7k wrote:

If your lncRNA is on the ith line in the matrix then in R if count is your matrix

cor <- apply(count,1,function(x){cor(x,count[i,])})

You may choose Pearson or Spearman for the correlation

ADD COMMENTlink written 7 months ago by Nicolas Rosewick7.7k

so, with this cor how to proceed further? I'm interested in doing spearman correlation.

ADD REPLYlink written 7 months ago by Vasu330

Set the method argument. See here: https://www.rdocumentation.org/packages/stats/versions/3.5.1/topics/cor

ADD REPLYlink modified 7 months ago • written 7 months ago by russhh4.4k

Sorry, I'm a bit confused. lets say I have matrix A like below. Ensembl ids as rows and Samples as columns. Using raw counts I used cpm function and converted them to logCPM values like below.

                       Sample1            Sample2           Sample3          Sample4          Sample5
ENSG00000000003.14        17.146506        16.822596        16.781746        16.932891        16.263722
ENSG00000000005.5          6.782761         7.941372         8.520003         8.241359         7.797734
ENSG00000000419.12        16.279996        16.663848        15.908999        14.737590        15.665799
ENSG00000000457.13        15.347626        15.454124        15.211375        15.686339        16.339990
ENSG00000000460.16        15.546598        15.720200        15.331334        15.262918        15.766690

Now, in this I want to check the correlation of ENSG00000000005.5 on all other Ensembl ids.

This is just an example data I'm showing. I have a single lncRNA and around 19k protein coding genes with logCPM values. How to apply the above function on this? And how to plot that with R (correlation coefficient value)?

ADD REPLYlink modified 7 months ago • written 7 months ago by Vasu330
1

my_cor <- apply(my_cpm, 1, function(x){cor(x,count["ENSG00000000005.5",], method = "spearman")})

I don't think plotting the correlation coefs would be particularly revealing; but you can do it if you want

ADD REPLYlink written 7 months ago by russhh4.4k

thanks a lot. I got the correlation coefficient values (R). This could tell whether the lncRNA has strong, moderate or weak correlation with other protein coding genes. But I have a small question what is R square in correlation? What does R square tell?

ADD REPLYlink written 7 months ago by Vasu330
1

Short description. if you have a pair of a variable (X and Y) then value R^2 and r^2 (output of cor) is the same. However, power of R^2 comes into the picture in multiple linear regression problem where multiple variables simultaneously used to predict the response.

Reference :
Excerpt From: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. “An Introduction to Statistical Learning.” iBooks.

ADD REPLYlink modified 7 months ago • written 7 months ago by Chirag Parsania1.4k

You should maybe start by reading about statistics before going further into your analysis 😉

ADD REPLYlink written 7 months ago by Nicolas Rosewick7.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1632 users visited in the last hour