I have a RNA-seq expression count matrix of 2 contrast conditions (~10 biological samples per condition), but these conditions are affected by (severe) batch effect from different sequencing experiments. I looked up for some batch-effect removal tools, but they could only fix batch-effect for samples of same condition group (different conditions may contain large true biological variations that account for most of batch-effect difference).
I plan to choose a group of housekeeping genes to adjust for this group difference, but I am still confusing about practical steps to do that. Could you please give me some suggestions? Here are some thoughts I am still questioning:
- Should I perform TPM then TMM cross-sample normalization before considering expression value of these housekeeping gene?
- In these housekeeping genes, there is probably a large difference in expression value between them, how could I straighten all of them down to one scaling factor for each sample, and then scale expression level of all other genes by this factor?
Thank you very much.