From the power analysis tools I've seen for RNA-seq, I feel like many published RNA-seq studies are wildly underpowered but no one ever seems to bat an eye about it. I very often see RNA-seq studies with 6, 8, or 10 samples per group, and from the power analysis tools I've looked at (https://cqs-vumc.shinyapps.io/rnaseqsamplesizeweb/ ; http://www2.hawaii.edu/~lgarmire/RNASeqPowerCalculator.htm ), it seems like these studies aren't really able to pick up on much of anything. I also see studies reporting only raw p-values and not FDR-corrected ones.
Why do we accept this, and why does nobody talk about it?
We absolutely should not be accepting using raw, unadjusted p-values in RNAseq experiments. Thankfully this is much less common now than it used to be, particularly in the early days of microarrays.
Estimation of power in RNAseq is very difficult. Part of this is that we often have no idea what the effect sizes we expect are, and asking what the disperision is doesn't really even make sense, because we have 20,000 genes, and each will have a different dispersion, finally, we often have no idea how many genes we expect to be DE. Empirical studies of power in RNAseq experiments suggest that you get 80% power to detect 2 fold changes at 5 replicates and recommend doing 6 so that a poor quality replicate can be discarded.
We put up with the current situation for many reasons, but it is worthwhile noting that RNAseq experiments are rarely hypothesis tests of the DE status of particular genes (which is what power calculations address). Power to detect pathways, or correlation structures, genotype-expression relationships etc, is not captured by power calculations on individual genes. Added to this, RNAseq experiments are more often hypothesis generating than hypothesis testing, and studies should contain further tests of the hypotheses generated from the RNAseq experiment.
As someone who works with RNA-seq quite frequently both on cell line and human samples I feel compelled to comment here. No serious researcher at the outset would want to do a RNA-seq (or any NGS study for that matter) without the proper number of replicates.
In my experience, we have considered atleast 3 replicates for all our studies. This is manageable for most part using cell lines, where samples is easily available even if something goes wrong in the sample processing steps. In most cases, there's also sufficient amount of samples leftover after sequencing that they can be used easily in case something goes wrong. However, if you are working with patient samples a lot of issues can arise in the sample processing pipeline and some samples/replicates have to be discarded. Additionally, the qualilty of samples themselves can be quite poor (degraded tissue samples (e.g. from paraffin embedded samples). Obtaining human samples itself is not an easy and quick process as you have to go through a lot of review regarding ethical concerns. In some cases, the samples themselves might be quite rare (e.g. for a rare disease) and it is not possible to get enough samples for your study that fulfills the needed statistical rigor. In those cases, the researcher has to made a judgement about not publishing the result vs. publishing the result (despite low statistical power), to share the potentially useful information that can be helpful for the larger research community. The reviewer examining such papers could also overlook the sub par statistics in favor for biologically important message that the paper might be conveying. In a common scenario, there might also be a lot of pressure for the researcher to publish from funding sources and collaborators. Even if it may be possible to get additional samples, the timeline might not be conducive to everyone involved.
So yes in ideal world, it would really be best if studies were sufficiently powered...however, in real world it's not always possible to do so due to multiple factors. The onus then lies on the reader to make an informed judgement about the publication they are reading.