based on a recent bioinformatics analysis of TCGA data, our goal is to perform correlation analysis in order to identify putative pairs of miRNA-mRNA expression pairs. In detail, based on rna-seq data, we have identified a small signature of 17 genes, as also 4 putative miRNA regulators quering a specific database, for these specific genes. The main issue of performing the actual correlation analysis, is the different transformations of the 2 numeric vectors, representing the same samples: the gene expression data are log2 transformed cpm values-normalized with TMM-as also a prior count added for small values, whereas the relative miRNA expression data for the same samples, are log2 RPM values, and also contain NA values. Thus, my main question is the following:
A) Is it possible to directly use these expression values, combine them and perform correlation analysis ? and RPM are essentially the same as CPM ? i found the following post, but without making it clear: Difference between CPM and RPM for RNA-seq reada quantification
B) If any transformations are essential for these two inputs, what would be more appopriate ?
Thank you in advance,