Question: Correlation between lncRNA and protein coding genes
0
gravatar for Bioinfo
7 weeks ago by
Bioinfo270
Bioinfo270 wrote:

I have RNA-Seq data for 300 samples. In which 250 are Tumor and 50 are Normals. I have a matrix with genes as rows and samples as columns.

There are almost 56k genes as rows. Among these genes there are also lncRNAs.

I would like to check the correlation between a specific lncRNA and all other protein coding genes. I want the value of R (correlation co-efficient).

How to do this for one lncRNA vs all protein coding genes in the genome?

lncrna rna-seq correlation genes R • 283 views
ADD COMMENTlink modified 7 weeks ago by Nicolas Rosewick7.0k • written 7 weeks ago by Bioinfo270
1
gravatar for Nicolas Rosewick
7 weeks ago by
Belgium, Brussels
Nicolas Rosewick7.0k wrote:

If your lncRNA is on the ith line in the matrix then in R if count is your matrix

cor <- apply(count,1,function(x){cor(x,count[i,])})

You may choose Pearson or Spearman for the correlation

ADD COMMENTlink written 7 weeks ago by Nicolas Rosewick7.0k

so, with this cor how to proceed further? I'm interested in doing spearman correlation.

ADD REPLYlink written 7 weeks ago by Bioinfo270

Set the method argument. See here: https://www.rdocumentation.org/packages/stats/versions/3.5.1/topics/cor

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by russhh4.1k

Sorry, I'm a bit confused. lets say I have matrix A like below. Ensembl ids as rows and Samples as columns. Using raw counts I used cpm function and converted them to logCPM values like below.

                       Sample1            Sample2           Sample3          Sample4          Sample5
ENSG00000000003.14        17.146506        16.822596        16.781746        16.932891        16.263722
ENSG00000000005.5          6.782761         7.941372         8.520003         8.241359         7.797734
ENSG00000000419.12        16.279996        16.663848        15.908999        14.737590        15.665799
ENSG00000000457.13        15.347626        15.454124        15.211375        15.686339        16.339990
ENSG00000000460.16        15.546598        15.720200        15.331334        15.262918        15.766690

Now, in this I want to check the correlation of ENSG00000000005.5 on all other Ensembl ids.

This is just an example data I'm showing. I have a single lncRNA and around 19k protein coding genes with logCPM values. How to apply the above function on this? And how to plot that with R (correlation coefficient value)?

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Bioinfo270
1

my_cor <- apply(my_cpm, 1, function(x){cor(x,count["ENSG00000000005.5",], method = "spearman")})

I don't think plotting the correlation coefs would be particularly revealing; but you can do it if you want

ADD REPLYlink written 7 weeks ago by russhh4.1k

thanks a lot. I got the correlation coefficient values (R). This could tell whether the lncRNA has strong, moderate or weak correlation with other protein coding genes. But I have a small question what is R square in correlation? What does R square tell?

ADD REPLYlink written 7 weeks ago by Bioinfo270
1

Short description. if you have a pair of a variable (X and Y) then value R^2 and r^2 (output of cor) is the same. However, power of R^2 comes into the picture in multiple linear regression problem where multiple variables simultaneously used to predict the response.

Reference :
Excerpt From: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. “An Introduction to Statistical Learning.” iBooks.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Chirag Parsania1.2k

You should maybe start by reading about statistics before going further into your analysis 😉

ADD REPLYlink written 7 weeks ago by Nicolas Rosewick7.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1153 users visited in the last hour