I would just like some clarification of terminology regarding a detail of gene coexpression network construction. Let's say I have two RNA-seq datasets, each dataset containing
n replicates, and each dataset representing sequencing data from the same biological system in two different experimental conditions. How should I construct the data matrix for input to something like WGCNA if I want to analyze gene coexpression networks across experimental conditions/interventions?
What I imagine is that each row of the matrix represents data from one gene, and each column represents data collected from one of the replicates in an experimental condition. So for example, one particular row of the matrix would look like this:
c1R1 ... c1Rn c2R1 ... c2Rn gene x [val, ... val, val, ... val]
Where the first column
c1R1 corresponds to the data from the first experimental replicate in the first condition, and the last column
c2Rn corresponds to the nth experimental replicate in the 2nd experimental condition. For coexpression analysis, each row is then correlated with every other row in a pairwise fashion, an adjacency matrix is constructed from the correlation analysis and then other analyses such as module detection can be conducted based on the resulting adjacency matrix.
I just want to verify that this is an appropriate method for organizing data if one wishes to construct coexpression networks for genes "across an intervention".