Hi , I have performed DE to identify DE-miRNAs in 2 conditions(4 replicates each) and same for mRNA. The purpose is to correlate the two dataframes and detect negative correlations between miRNA and mRNA datasets to predict targets. Does anyone have a helpful tutorial about this? I know how to perform correlation in R but I can't figure out what format I should use. The 2 datasets are different in length (row length) and I don't know how to get the desired output. (The counts here are normalised counts)
miRNA dataframe:
names SH_1 SH_2 SH_3 SH_4 treatment_1 treatment_2 treatment_3 treatment_4
mmu-miR-1 7648.00 5864.84 8105.39 8198.10 1038.50 682.71 1067.37 1007.00
mmu-miR-2 4676.32 3384.97 3610.91 3756.65 1204.76 767.89 1011.92 919.11
mmu-miR-3 95.32 81.25 73.36 84.91 18.35 11.57 14.34 13.65
mmu-miR-4 332.17 237.58 279.33 327.81 63.45 46.26 59.28 65.92
mmu-miR-5 845.84 652.74 699.92 769.34 183.08 215.15 259.11 268.98
mmu-miR-6 167.72 119.72 115.00 131.28 47.01 24.60 40.16 29.84
..etc (67 miRs)
mRNA dataframe: same but has 211 genes
The desired output:
Gene-miRNA- p-value- correlation
gene.A miRb : 0.005196 -0.9999667
gene.B miRa : 0.005261 -0.999966659
Gene.N miRN: 0.00658 -0.9999473
Hi, thank you for your answer, interesting information regarding the threshold! will keep that in mind. Could you explain what you mean "can we find differentially expressed miRNAs that are predicted to target differentially expressed genes, and don't bother with what we expect the effect size to be, beyond the correct sign."
Code-wise, what does this mean exactly? do you mean negative correlation can produce some false positives?
Also: this line comes up with error all_corrs <- sapply(rownames(miRNA_mat), per_miRNA, simplify=FALSE)
"Error during wrapup: 'x' must be a numeric vector Error: no more error handlers available (recursive errors?); invoking 'abort' restart"
Actually, I'd guess more likely the opposite - because the relationship is not linear, it would not be detected by pearson's correlation.
We do differential expression analysis separately on genes and on miRNAs. Then, to call a gene a target of an miRNA, we predict targets for the differentially expressed miRNAs among the differentially expressed genes, only taking forwards those where significantly down regulated gene is the target of a significantly upregulated miRNA or vice-versa.
There were a couple of typos in the code - (rownanes -> rownames & coors->corrs), but I'm guessing the mostly likely course of your error is that your input data are not numeric matricies. Perhaps they are dataframes, or character matricies?
Hi, thank you for your answers! Actually what you said makes sense but I have seen countless papers saying they used pearson correlation for the same purpose, that's why I'm trying to use it as well!
My main problem is that the two input data are different in length, I have 60 DE-miRNAs but around 200 DE-genes, everything I'm trying is failing for that reason. I can't run a correlation test on them like that..
It shouldn't matter that they are different lengths - they would only need to be the same length if you were testing miRNA1 against gene1 and miRNA2 against gene2 etc, where as in fact you are testing all len(miRNAs)*len(mRNAs) pairs.
Right, the function is not working unfortunately, is there another way to do this? it insists that my input is not numeric even though it is!
If R thinks your input is not numeric, then every other way is also going to fail.
What is the output of
str
,class
andmode
for your miRNA_mat and mRNA_mat inputs?For instance, for the inputs I tested the code on:
Hi, For my matrix:
not sure why it shows "array" as well.
The way this code is written, you need miRNA names at the rownames of the matrix. At the moment there are not rownames.
You could rewrite the above code to do row numbers, rather than row names or simply add row names.