Question

RNA-seq subsampling for Monte Carlo simulation to test data?

0

Entering edit mode

7.5 years ago

nicholas.owen1 • 0

So I am investigating some RNA-seq data of two sample groups, each with 10+ samples, to look for differential gene expression and differential usage (DESeq2 and JunctionSeq respectively).

From my understanding that the power of analysis is greater when I use all samples (as long as no outliers etc) for obtaining the output. However, I have been asked recently to subset both the sample groups into smaller groups and run the analysis again to see if the same things come out, and run if for all combinations of sampling from the groups ie a Monte Carlo simulation.

This seems a little strange to me as the significance of the data output is going to be less valid compared to that for the complete data set ?

Apologies if this seems like a weird question or weird thought process but its been mentioned a number of times I should do this.

Any thoughts very much appreciated :D

RNA-Seq Monte-Carlo • 2.1k views

ADD COMMENT • link 7.5 years ago by nicholas.owen1 • 0

0

Entering edit mode

Sounds like what was done here, but for a weird reason. But yes, I would ask whomever requested this to "lay out the statistical reasoning for this in light of its effect on empirical bayes shrinkage." Use that phrase literally, since if they can't interpret it then they don't know enough to ask you to do this.

ADD REPLY • link 7.5 years ago by Devon Ryan 104k

0

Entering edit mode

I've seen this done to measure the effect of sample number of power to detect differences. For example, this sort of analysis was used in Schurch et al (2016) (companion paper to the above).

One might also imagine it being use to detect if a reduced number of differentially expressed genes coming out of one analysis compared to another is due to reduced power.

ADD REPLY • link 7.5 years ago by i.sudbery 19k