I have paired-end RNA-seq reads from a drug-treatment experiment, with < 15 million mapped reads in many samples (too few reads) and large variability in mapped reads across biological replicates. Differential expression and splicing analysis on these samples indicate that statistical power in my tests could be improved if I had better sequencing depth, and I have remaining RNA from these samples available to re-sequence.
Is it analytically and statistically tractable to re-sequence the same samples and control for potential artifacts in the combined data?
What would be the best workflow for merging data from these two RNA-seq runs? I would guess that it's best to keep the runs separate until the counts have been summarized. Then I can carry out PCA to visually inspect the gross extent of artifact in the different runs before merging the counts for statistical analyses.
Beyond gross visual inspection of PC's, what sorts of quality control steps could I take if I identify a strong batch effect between the different sequencing runs? Would software like svaseq or combat be appropriate here if I do identify a batch effect? If so, would it be best to remove the batch effect in the samples before combining the count data?