How to analyze the Pearson correlation coefficient of mRNA abundance between two biological replicates?
Can I use deeptools?
How to analyze the Pearson correlation coefficient of modification sites of nucleotide in mRNA?
I have already got the modification sites, but I do not know how to analyze the correlation between samples.
"the Pearson correlation coefficient of mRNA abundance". I don't think this would be a good idea because mRNA abondance (that is, I guess, read count per gene) follows a long tail distribution that would skew the calculation the correlation. Instead, I would either use Spearman correlation on the read counts per gene, or Pearson correlation on (r)log-transformed read counts per genes.
"Can I use deeptools?" I think that it is possible to use deeptools for that purpose with the bamCorrelate and plotCorrelation functions. Another approach, perhaps more common, is to calculate the number of read per genes, feed that into DESeq/EdgeR, normalize and transform the data (with rlog transformation or instance), then calculate correlation.
"How to analyze the Pearson correlation coefficient of modification sites of nucleotide in mRNA?" Well, in that case I would not use a correlation because you do not have a continuous variable (it is just presence/absence from what I understand). Instead I would make a Venn diagram + FIsher's exact test to see if there is significant overlap between the modified sites in sample A vs B.
Thanks a lot for your suggestions.
I agree that it is better to use the read counts per gene rather than mRNA abundance to calculate the Pearson correlation.
However, I am trying to reproduce a paper (https://doi.org/10.1016/j.molp.2018.01.008), in which I found that they analyze the Pearson correlation of mRNA abundance in Fig.1(B).
Now I understand that it is acceptable, but if they use the read counts per gene will be much better,
Besides, I have got continuous variables for my experiment. Can I calculate the Spearman correlation using cor() in R?
From the figure, I can tell that they calculated to correlation on the log of mRNA abundance, which makes more sense. I haven't read the paper in details but it is unclear to me what they call "mRNA abundance". From what I see, it could very well be read counts per gene.
Besides, I have got continuous variables for my experiment. Can I
calculate the Spearman correlation using cor() in R?
What exactly do you wish to correlate? The number of modifications? The location? ...?
Sorry for your confusion. I want to correlation the location of modification sites of nucleotide in mRNA from different samples.