calculate statistically significant miRNA and mRNA target pairs using pearsons correlation
1
0
Entering edit mode
3.1 years ago
shaden ▴ 20

Hi , I have performed DE to identify DE-miRNAs in 2 conditions(4 replicates each) and same for mRNA. The purpose is to correlate the two dataframes and detect negative correlations between miRNA and mRNA datasets to predict targets. Does anyone have a helpful tutorial about this? I know how to perform correlation in R but I can't figure out what format I should use. The 2 datasets are different in length (row length) and I don't know how to get the desired output. (The counts here are normalised counts)

miRNA dataframe:

names   SH_1    SH_2    SH_3    SH_4    treatment_1 treatment_2 treatment_3 treatment_4
mmu-miR-1   7648.00 5864.84 8105.39 8198.10 1038.50 682.71  1067.37 1007.00
mmu-miR-2   4676.32 3384.97 3610.91 3756.65 1204.76 767.89  1011.92 919.11
mmu-miR-3   95.32   81.25   73.36   84.91   18.35   11.57   14.34   13.65
mmu-miR-4   332.17  237.58  279.33  327.81  63.45   46.26   59.28   65.92
mmu-miR-5   845.84  652.74  699.92  769.34  183.08  215.15  259.11  268.98
mmu-miR-6   167.72  119.72  115.00  131.28  47.01   24.60   40.16   29.84

..etc (67 miRs)

mRNA dataframe: same but has 211 genes

The desired output:

Gene-miRNA- p-value- correlation

gene.A miRb : 0.005196 -0.9999667

gene.B miRa : 0.005261 -0.999966659

Gene.N miRN: 0.00658 -0.9999473

miRNA correlation pearson miRNA-mRNA • 1.4k views
ADD COMMENT
0
Entering edit mode
3.1 years ago

Code-wise, this is fairly easy, I'm sure their are more elegant ways, but:

get_cor <- function(mRNA, miRNA) {
    target_corr <- cor.test(miRNA_mat[miRNA,], mRNA_mat[mRNA,], method="pearson")
    return(data.frame(gene = mRNA,
                      mirRNA = miRNA,
                      p.value=target_corr$p.value,
                      correlation=target_corr$estimate))
    }

per_miRNA <- function(miRNA){
       this_miRNA_corrs <- sapply(rownames(mRNA_mat), get_cor, miRNA=miRNA, simplify=FALSE)
       this_miRNA_corrs <- do.call(rbind, this_miRNA_corrs)
       return(this_miRNA_coors)
}

all_corrs <- sapply(rownanes(miRNA_mat), per_miRNA, simplify=FALSE)
all_corrs <- do.call(rbind, all_corrs)
all_corrs$padj <- p.adjust(all_corrs$p.value, method="BH")

You'll need your data in two matricies miRNA_mat and mRNA_mat. You probably want to ask yourself if this is the analysis you want to do. Firstly this analysis assumes that that both miRNA and mRNA counts are normally distributed, which they are not. It also assumes that there is a linear relationship between the amount of miRNA the effect on the mRNA (e.g. that if you double the amount of miRNA, you'll have twice the effect on mRNA), but this is not a known fact. Indeed, modeling efforts thus far suggest that miRNA acts more like a linear rectifier - that when target expression level is below a certain theshold they completely inhibit expression, and above this threshold, they have very little effect on target expression. See: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3163764/

For these reasons, we normally preffer to do a categorical analysis - can we find differentially expressed miRNAs that are predicted to target differentially expressed genes, and don't bother with what we expect the effect size to be, beyond the correct sign.

ADD COMMENT
0
Entering edit mode

Hi, thank you for your answer, interesting information regarding the threshold! will keep that in mind. Could you explain what you mean "can we find differentially expressed miRNAs that are predicted to target differentially expressed genes, and don't bother with what we expect the effect size to be, beyond the correct sign."

Code-wise, what does this mean exactly? do you mean negative correlation can produce some false positives?

Also: this line comes up with error all_corrs <- sapply(rownames(miRNA_mat), per_miRNA, simplify=FALSE)

"Error during wrapup: 'x' must be a numeric vector Error: no more error handlers available (recursive errors?); invoking 'abort' restart"

ADD REPLY
1
Entering edit mode

Code-wise, what does this mean exactly? do you mean negative correlation can produce some false positives?

Actually, I'd guess more likely the opposite - because the relationship is not linear, it would not be detected by pearson's correlation.

Could you explain what you mean by...

We do differential expression analysis separately on genes and on miRNAs. Then, to call a gene a target of an miRNA, we predict targets for the differentially expressed miRNAs among the differentially expressed genes, only taking forwards those where significantly down regulated gene is the target of a significantly upregulated miRNA or vice-versa.

Also: this line comes up with error

There were a couple of typos in the code - (rownanes -> rownames & coors->corrs), but I'm guessing the mostly likely course of your error is that your input data are not numeric matricies. Perhaps they are dataframes, or character matricies?

ADD REPLY
0
Entering edit mode

Hi, thank you for your answers! Actually what you said makes sense but I have seen countless papers saying they used pearson correlation for the same purpose, that's why I'm trying to use it as well!

My main problem is that the two input data are different in length, I have 60 DE-miRNAs but around 200 DE-genes, everything I'm trying is failing for that reason. I can't run a correlation test on them like that..

ADD REPLY
0
Entering edit mode

It shouldn't matter that they are different lengths - they would only need to be the same length if you were testing miRNA1 against gene1 and miRNA2 against gene2 etc, where as in fact you are testing all len(miRNAs)*len(mRNAs) pairs.

ADD REPLY
0
Entering edit mode

Right, the function is not working unfortunately, is there another way to do this? it insists that my input is not numeric even though it is!

ADD REPLY
0
Entering edit mode

is there another way to do this

If R thinks your input is not numeric, then every other way is also going to fail.

What is the output of str, class and mode for your miRNA_mat and mRNA_mat inputs?

For instance, for the inputs I tested the code on:

> str(miRNA_mat)
 num [1:10, 1:10] -1.485 1.087 -0.7747 -1.5643 0.0169 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:10] "miRNA 1" "miRNA 2" "miRNA 3" "miRNA 4" ...
  ..$ : chr [1:10] "sample 1" "sample 2" "sample 3" "sample 4" ...
> class(miRNA_mat)
[1] "matrix"
> mode(miRNA_mat)
[1] "numeric"
ADD REPLY
0
Entering edit mode

Hi, For my matrix:

str(miRNA_mat)

 num [1:66, 1:12] 7648 4676.3 95.3 332.2 845.8 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:12] "SH_1" "SH_2" "SH_3" "SH_4" ...

 class(miRNA_mat)

[1] "matrix" "array" 

mode(miRNA_mat)

[1] "numeric"

not sure why it shows "array" as well.

ADD REPLY
0
Entering edit mode

The way this code is written, you need miRNA names at the rownames of the matrix. At the moment there are not rownames.

You could rewrite the above code to do row numbers, rather than row names or simply add row names.

ADD REPLY

Login before adding your answer.

Traffic: 3187 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6