I have two datasets, one with ~250 sample, and another with 7 samples. Both datasets are of RPKM values computed from human RNA-Seq. I don't have access to the primary reads files.
Is there a good way to batch-correct these datasets so that I can combine them and scan for expression signatures? I'm currently using an algorithm that creates a geometric average of the RPKM values for groups of genes that belong in a specific signature in order to compare samples, but the RPKM values of the ~250 sample dataset are on average much higher than the 7 sample dataset.
I've used ComBat in the past for the same predicament but with microarray expression data, and it worked perfectly. I'm looking for something analogous for RPKM expression data.