I have been working with a RNA-Seq dataset composed of 3 wt libraries vs. 9 mutant libraries (3 different conditions, 3 replicates each) and right now I would like to explore if and which genes would be more highly correlated with a certain (set of) gene(s), in terms of their expression level along all the tested conditions.
I did some computations in order to obtain the pairwise Pearson's correlations and the associated p-values (adjusted to multiple testing), but I was wondering if it is possible and more correct to actually use a GLM sort of approach in order to extract the set of most highly correlated genes. Or would it be redundant?
In addition, how could I apply such a methodology?
All that I did right now, was to read my expression table (a line per gene containing 12 columns of expression values each - normalized ones, log2 FPKM + 1), transposed it and then by subsetting to the gene of interest (geneX) applied a cor.test function in order to assess the correlation between geneX and all the other genes in the matrix.
Any help would be really valuable right now.
Thank you very much for your insights.