Determining Pearson correlation coefficients and significance (p-value) for miRNA and mRNA pairs using R
2
0
Entering edit mode
7.7 years ago
v.baaskarla ▴ 40

Hi guys,

I'm having trouble in figuring out the correct method in R to calculate statistically significant miRNA and mRNA target pairs based on pearsons correlation coefficients and p-value (p < 0.05).

My data looks like this:

miRNA logFC gene logFC

miR156d-3p -2.322077683 gene.17474899 0.34139474

miR156d-3p -2.322077683 gene.17436642 0.329760915

miR156d-3p -2.322077683 gene.17430283 -0.416809156

miR156d-3p -2.322077683 gene.17430250 -0.643149662

miR156d-3p -2.322077683 gene.17451144 -0.9889254

miR156j -10.06813372 gene.17473577 1.075675282

miR156j -10.06813372 gene.17437315 0.900629506

miR156j -10.06813372 gene.17425609 0.692140529

My desired output:

Gene name miRNA p-value correlation

gene.A miR396b 0.005196 -0.9999667

gene.B miR396a 0.005261 -0.999966659

Gene.N miR167 0.00658 -0.9999473

Looking forward to your response.

V

RNA-Seq sequencing R • 4.1k views
ADD COMMENT
1
Entering edit mode

The correlation of what ? You need more than a pair of observation to compute a correlation...

ADD REPLY
3
Entering edit mode
7.7 years ago

Have you looked at the cor.test(x,y)) function in base R ? To test the correlation of one miRNA with one gene target you can do the following:

### for one correlation, you need the vectors of the log2FC over time
geneX_over_time=c(2,-5,9,-18,23,2)  # The 6 log2FC values for one gene for the different times.
miRNAX_over_time=c(-3,2,0,18,1,-2)  # idem for the associated miRNA
cor.test(geneX_over_time, miRNAX_over_time)

Now you can use apply() to test all pairs at once but you need to create matrix of gene expression over time. For instance :

#matrix exemple (myMat) # matrix with the expression of the miRNA and associated genes overtime.
colnames(myMat) # the names of the columns
> Gene    log2FC_T1    log2FC_T2    [•••]    log2FC_T6    miRNA    log2FC_T1    [•••] log2FC_T6

apply(myMat, 1, function(x) cor.test(x[2:7], x[8:13]) 
# We apply cor.test() on every row of myMat, testing the correlation between the log2FC of the genes and associated miRNA over time.

You might also use the argument alternative="less" to test only for anti-correlation.

ADD COMMENT
1
Entering edit mode

Yup. Here's a ling to the function description: cor.test You just need the time points as arrays for your pairs.

ADD REPLY
0
Entering edit mode

Thanks for the Info Carlo and Dan.

I think I'm having an issue with the format of my x and y matrices:

My current matrix format:

gene_matrix:

Gene G1.gene G2.gene

gene.17424043 1.713389541 2.028486076

gene.17425117 -0.907690381 -0.986083479

gene.17426455 1.061153312 0.559967003

mirna_matrix:

miRNA G1.mirna G2.mirna

miR157d-3p -3.190930261 3.449600932

miR167f-3p 3.17673357 -2.832038598

miR167f-3p 3.17673357 -2.832038598

miR171n 7.149974447 -9.551883688

using cor.test (cor.test(gene_matrix, mirna_matrix, alternative = "less", method = "pearson"))

My output looks like this:

Pearson's product-moment correlation

data: gene_matrix and mirna_matrix t = -0.94396, df = 32, p-value = 0.1761 alternative hypothesis: true correlation is less than 0 95 percent confidence interval: -1.0000000 0.1286039 sample estimates: cor -0.1645934

As it is comparing the entire x and y matrices.

I'm not able to figure out how to use apply() function to do the pairwise comparisons to get the desired output.

I'd greatly appreciate if you can comment on the issues.

V

ADD REPLY
2
Entering edit mode

I edited my answer to add more details on the input needed for the function. Since it is mostly a data manipulation issue I suggest you run through some R tutorials, things will become easier then.

ADD REPLY
2
Entering edit mode

Carlo updated his answer well. Your issue is that you are passing the function two matrices and not vectors. You can specify a particular column of a matrix though as a vector with the $operator and the column name.

ADD REPLY
0
Entering edit mode

Thanks Carlo and Dan.

Your inputs were really helpful and I was able to get the desired results.

Appreciate it.

ADD REPLY
1
Entering edit mode

Hi Guys,

I managed to write a small script to do the task:

Script:

data <- read.table("Data_file.txt", sep = "\t", header = T, stringsAsFactors = F)
data[data == 0] <- 0.01

data1<-as.matrix(data[,2:7])
rownames(data1)<-data[,1]

data2<-as.matrix(data[,9:14])
rownames(data2)<-data[,8]

output<-matrix(dim(data)[1]*4,dim(data)[1],4)

for (i in dim(data)[1]){
  r<-cor.test(data1[i,],data2[i,], method="pearson")
  output[i,3]<-r$p.value
  output[i,4]<-r$estimate
  output[i,1]<-data[i,1] # target geneID
  output[i,2]<-data[i,8] # miRNAID
}
colnames(output)<-c("geneID","miRNAID","p-val","corr")
head(output) # check the result, if you are happy with them, then save it
write.csv(output,file="output.csv")

But I'm having issues with my output:

I'm only able to get the results for last row as shown in the data table below.

geneID miRNAID p-val corr

28 28 28 28

28 28 28 28

gene.17472635 miR164b-3p 0.055208247 0.801471021

I'd greatly appreciate your suggestions in this regard.

V

ADD REPLY
1
Entering edit mode

Found the issue:

i in 1:dim(data)[1]

V

ADD REPLY
0
Entering edit mode

Glad you could figure it out

ADD REPLY
2
Entering edit mode
7.7 years ago
DG 7.3k

Do you have multiple experiments in order to do the comparisons with? You can't calculate a correlation between a pair unless you have multiple observations (samples) to base that off of.

ADD COMMENT
1
Entering edit mode

Thanks for the reply Dan.

I should've been more thorough with the question.

Yes. I have multiple time points. The above example is one of the six time points. I would like to correlate the expression of miRNA:mRNA pairs to predict expression patterns of miRNAs and their targets.

Similar to the work done in the following article: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4509933/

ADD REPLY

Login before adding your answer.

Traffic: 2934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6