Hello! I have a question about using bootstrapping in RNA-seq analysis.
I’m working with a dataset containing three biological replicates for each of two conditions. During EDA (PCA and heatmap clustering), I observed that the replicates in condition 1 cluster closely together and show consistent gene expression patterns, but in condition 2, the replicates are a bit more spread out, especially one of them.
That makes things tricky when trying to detect up- or downregulated genes between the two conditions, since the variability in condition 2 is pretty high. I don’t have technical replicates, so I can’t really dig deeper into what’s causing that variability, whether it’s biological or technical.
So my question is: would bootstrapping the biological replicates be useful here to explore that variability or to get a better sense of how stable the differential expression patterns are? I know it doesn’t create new data, but could it help highlight how sensitive the results are to which samples are included?
Curious if this is a reasonable approach, or not really recommended in this kind of setup.
I don't think I would really be concerned about this slightly greater variability between reps, especially if it is a biological reps. I also don't think PCA is a good measure for that question in this case.
You can look into correlation plots. I've also appreciated the idea of the SERE coefficient. https://link.springer.com/article/10.1186/1471-2164-13-524
Nevertheless, using a robust DE package is pretty much designed for these situations.