RNA-seq subsampling for Monte Carlo simulation to test data?
Entering edit mode
5.9 years ago

So I am investigating some RNA-seq data of two sample groups, each with 10+ samples, to look for differential gene expression and differential usage (DESeq2 and JunctionSeq respectively).

From my understanding that the power of analysis is greater when I use all samples (as long as no outliers etc) for obtaining the output. However, I have been asked recently to subset both the sample groups into smaller groups and run the analysis again to see if the same things come out, and run if for all combinations of sampling from the groups ie a Monte Carlo simulation.

This seems a little strange to me as the significance of the data output is going to be less valid compared to that for the complete data set ?

Apologies if this seems like a weird question or weird thought process but its been mentioned a number of times I should do this.

Any thoughts very much appreciated :D

RNA-Seq Monte-Carlo • 1.9k views
Entering edit mode

Sounds like what was done here, but for a weird reason. But yes, I would ask whomever requested this to "lay out the statistical reasoning for this in light of its effect on empirical bayes shrinkage." Use that phrase literally, since if they can't interpret it then they don't know enough to ask you to do this.

Entering edit mode

I've seen this done to measure the effect of sample number of power to detect differences. For example, this sort of analysis was used in Schurch et al (2016) (companion paper to the above).

One might also imagine it being use to detect if a reduced number of differentially expressed genes coming out of one analysis compared to another is due to reduced power.


Login before adding your answer.

Traffic: 1156 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6