Question

miRNA-mRNA expression correlation

1

Entering edit mode

7.2 years ago

Emilio Marmol ▴ 170

Hello everyone, so I have two data.frames with expression data obtained from an RNA-seq experiment. I have one data.frame with selected miRNA expression values over 24 samples, and another data.frame with selected genes expression values over the same 24 samples.

Each data.frames were created selecting those miRNAs or mRNAs that showed differential expression using DESeq2, filtered by FDR < 0.05. Expression values are obtanied from cpm formula in edgeR package in raw counts, which were obtanied via FeatureCounts. This cpm values were calculated before normalisation of data performed by DESeq2.

Example from mRNA cpm data:

mRNA                   sample6        sample8         sample87        sample139   
ENSSSCG00000013396  4.9226133236    7.2400541062    3.6369306772    5.0415819189
ENSSSCG00000022687  16.0221597119   13.9341369192    2.530038732    2.9623893757
ENSSSCG00000021638  61.0593383407   82.4891410464   13.8022648681   10.7087615941
ENSSSCG00000013397  5.2776094767    5.1511204625    3.0947795203    4.8023827767
ENSSSCG00000016338  10.6498845943   7.9284526934    12.2435802921   11.5183586906
ENSSSCG00000008171  6.3425979362    6.0294221081    1.6942223651    1.9135931371
ENSSSCG00000010464  222.0855934065  256.3928668898  437.7870591546  191.7273123892
ENSSSCG00000023714  22.7197538012   42.2771684039   16.8970443884   18.7127328887
ENSSSCG00000024527  12.1645348477   15.3346719758   76.1948271686   53.2494090263
ENSSSCG00000017986  9.5848961349    11.133066806    57.1743574159   42.2462484881

Example from miRNA cpm data:

miRNA              sample6         sample8         sample87       sample139
ssc-miR-1285    36.2788665777   37.6145686343   2286.6900268583 34.3905779882
ssc-miR-339      1.2596828673   4.4514282408    4.9803454225    2.5163837552
ssc-miR-421-5p  22.1704184641   6.8997137732    3.5573895875    13.211014715
ssc-miR-374a-3p 136.2976862397  115.5145628475  69.7248359154   155.3866968856
ssc-miR-129a-3p 6.8022874833    25.1505695602   40.5542412977   6.7103566806
ssc-miR-296-5p  5.542604616     13.1317133102   38.4198075452   8.8073431433
ssc-miR-7       307.3626196163  274.2079796303  152.2562743459  337.6148204938

cpm values were obtained vía this R function from edgeR package:

y2 <- cpm(x, normalized.lib.sizes=FALSE)

where x is the table obtained with raw counts from FeatureCounts, no previous normalisation taken.

I would like to correlate miRNA-mRNA expression levels, expecting to select those with negative correlation as miRNAs act as inhibitors of gene expression if expressed, or enhancers of gene expression if repressed.

I've used the corr.test() function in R package psych, to get Spearman and Pearson correlation matrices, with correlation and FDR corrected p-values, but I would like to know which test (Spearman/Kendall or Pearson) would be the most appropiate aproach. I tend to think that Spearman should be the chosen one, as the distribution showed in expression data in each sample is no parametric, but I've seen some papers implementing simple Pearson correlation. According to my data, what should be the best aproach to take?

Do you know any other formula to have this work done? For instance, regression (I'm not very sure about the correct way to implement regression with this data...). Any package that solves this particular problem? Any other statistical aproach?

Thanks.

RNA-Seq R miRNA mRNA correlation • 4.1k views

ADD COMMENT • link updated 7.2 years ago by Benn 8.3k • written 7.2 years ago by Emilio Marmol ▴ 170

0

Entering edit mode

+1 for selecting the appropriate test before you do it instead of just taking the one that worked the best.

ADD REPLY • link 7.2 years ago by Asaf 10k

score 1 · Answer 1 · 2017-02-09

1

Entering edit mode

7.2 years ago

Benn 8.3k

Use Pearson if you expect a linear correlation between the two, and Spearman for a monotonic correlation. In other words, if the line is straight: Pearson, if the line is not straight: try Spearman.

PS. I would use log2 transformed data

ADD COMMENT • link 7.2 years ago by Benn 8.3k

0

Entering edit mode

log2 transformed data from raw counts or log2 transformed data from cpm data? I assume that via cpm, my data is ranked, as cpm takes one million interval and meassures relative expression of each gene or miRNA over samples.

I do not know, sincerely, what type of correlation to expect, since miRNA function is to repress target mRNA expression, when over expressed miRNA, those mRNAs that are targeted by this miRNA, should show a decreased expression, and viceversa, but Im not sure if this correlation can be taken as linear or not.

ADD REPLY • link 7.2 years ago by Emilio Marmol ▴ 170

0

Entering edit mode

log2(cpm+0.5) I would suggest.

If Spearman's rho is larger than Pearson's r, it's not linear. See first answer herein:

http://stats.stackexchange.com/questions/8071/how-to-choose-between-pearson-and-spearman-correlation

ADD REPLY • link 7.2 years ago by Benn 8.3k

0

Entering edit mode

And why taking log2 + 0.5 transformation and not just cpm transformation? I've seen that Spearman rho are larger than Pearson r so i will assume non linear correlation.

ADD REPLY • link 7.2 years ago by Emilio Marmol ▴ 170

0

Entering edit mode

It is wise to transform expression values into the log scale. This was done already with microarrays, many years ago. The reason is that your data will be more normal than when not transformed.

ADD REPLY • link 7.2 years ago by Benn 8.3k

score 0 · Answer 2 · 2017-02-09

Since gene expression is complex and not highly influenced by miRNAs, you will most likely not get a correlation between the mRNA and its regulating miRNA. Another, less stringent, approach would be to choose two groups of samples - one in which the miRNA is lowly expressed and one in which it was highly expressed and compare the mRNA levels between these two groups (I think Wilcoxon would fit here).