Hello everyone, so I have two data.frames with expression data obtained from an RNA-seq experiment. I have one data.frame with selected miRNA expression values over 24 samples, and another data.frame with selected genes expression values over the same 24 samples.
Each data.frames were created selecting those miRNAs or mRNAs that showed differential expression using DESeq2, filtered by FDR < 0.05. Expression values are obtanied from cpm formula in edgeR package in raw counts, which were obtanied via FeatureCounts. This cpm values were calculated before normalisation of data performed by DESeq2.
Example from mRNA cpm data:
mRNA sample6 sample8 sample87 sample139 ENSSSCG00000013396 4.9226133236 7.2400541062 3.6369306772 5.0415819189 ENSSSCG00000022687 16.0221597119 13.9341369192 2.530038732 2.9623893757 ENSSSCG00000021638 61.0593383407 82.4891410464 13.8022648681 10.7087615941 ENSSSCG00000013397 5.2776094767 5.1511204625 3.0947795203 4.8023827767 ENSSSCG00000016338 10.6498845943 7.9284526934 12.2435802921 11.5183586906 ENSSSCG00000008171 6.3425979362 6.0294221081 1.6942223651 1.9135931371 ENSSSCG00000010464 222.0855934065 256.3928668898 437.7870591546 191.7273123892 ENSSSCG00000023714 22.7197538012 42.2771684039 16.8970443884 18.7127328887 ENSSSCG00000024527 12.1645348477 15.3346719758 76.1948271686 53.2494090263 ENSSSCG00000017986 9.5848961349 11.133066806 57.1743574159 42.2462484881
Example from miRNA cpm data:
miRNA sample6 sample8 sample87 sample139 ssc-miR-1285 36.2788665777 37.6145686343 2286.6900268583 34.3905779882 ssc-miR-339 1.2596828673 4.4514282408 4.9803454225 2.5163837552 ssc-miR-421-5p 22.1704184641 6.8997137732 3.5573895875 13.211014715 ssc-miR-374a-3p 136.2976862397 115.5145628475 69.7248359154 155.3866968856 ssc-miR-129a-3p 6.8022874833 25.1505695602 40.5542412977 6.7103566806 ssc-miR-296-5p 5.542604616 13.1317133102 38.4198075452 8.8073431433 ssc-miR-7 307.3626196163 274.2079796303 152.2562743459 337.6148204938
cpm values were obtained vía this R function from edgeR package:
y2 <- cpm(x, normalized.lib.sizes=FALSE)
where x is the table obtained with raw counts from FeatureCounts, no previous normalisation taken.
I would like to correlate miRNA-mRNA expression levels, expecting to select those with negative correlation as miRNAs act as inhibitors of gene expression if expressed, or enhancers of gene expression if repressed.
I've used the corr.test() function in R package psych, to get Spearman and Pearson correlation matrices, with correlation and FDR corrected p-values, but I would like to know which test (Spearman/Kendall or Pearson) would be the most appropiate aproach. I tend to think that Spearman should be the chosen one, as the distribution showed in expression data in each sample is no parametric, but I've seen some papers implementing simple Pearson correlation. According to my data, what should be the best aproach to take?
Do you know any other formula to have this work done? For instance, regression (I'm not very sure about the correct way to implement regression with this data...). Any package that solves this particular problem? Any other statistical aproach?