So I am investigating some RNA-seq data of two sample groups, each with 10+ samples, to look for differential gene expression and differential usage (DESeq2 and JunctionSeq respectively).
From my understanding that the power of analysis is greater when I use all samples (as long as no outliers etc) for obtaining the output. However, I have been asked recently to subset both the sample groups into smaller groups and run the analysis again to see if the same things come out, and run if for all combinations of sampling from the groups ie a Monte Carlo simulation.
This seems a little strange to me as the significance of the data output is going to be less valid compared to that for the complete data set ?
Apologies if this seems like a weird question or weird thought process but its been mentioned a number of times I should do this.
Any thoughts very much appreciated :D