I have whole transcriptome RNA-Seq data for 20 paired samples (10 controls and 10 stimulated with an agent). I have ~2,500 differentially expressed genes between the controls and stimulated samples, and I would like to identify some smaller modules of genes acting together within this big number of genes. Ideally those would be then matched with a common upstream transcription factor, or other regulator. I would like to perform a co-expression analysis on my differentially expressed genes, but I am not sure what is the best way to do this with paired samples.
1) What is the best input data for calculating correlations and co-expression? I have considered: the read counts mapped to my genes (that would not take into account pairing between samples), the difference in read counts between my stimulated sample and its control (one value per sample pair) or log2 fold change between the pair. Maybe there is an even better solution?
2) Which correlation metric would be better to use with this small sample size? Pearson or Spearman?
3) When clustering my genes - is it better to consider genes correlated in one direction only, or both (positively and negatively correlated)?
Any help would be greatly appreciated!