Question

Bootstrapping in RNA-seq

0

Entering edit mode

12 weeks ago

littlebioinformatician • 0

Hello! I have a question about using bootstrapping in RNA-seq analysis.

I’m working with a dataset containing three biological replicates for each of two conditions. During EDA (PCA and heatmap clustering), I observed that the replicates in condition 1 cluster closely together and show consistent gene expression patterns, but in condition 2, the replicates are a bit more spread out, especially one of them.

enter image description here

That makes things tricky when trying to detect up- or downregulated genes between the two conditions, since the variability in condition 2 is pretty high. I don’t have technical replicates, so I can’t really dig deeper into what’s causing that variability, whether it’s biological or technical.

So my question is: would bootstrapping the biological replicates be useful here to explore that variability or to get a better sense of how stable the differential expression patterns are? I know it doesn’t create new data, but could it help highlight how sensitive the results are to which samples are included?

Curious if this is a reasonable approach, or not really recommended in this kind of setup.

bootstrapping RNAseq • 598 views

ADD COMMENT • link updated 12 weeks ago by rfran010 ★ 1.6k • written 12 weeks ago by littlebioinformatician • 0

1

Entering edit mode

I don't think I would really be concerned about this slightly greater variability between reps, especially if it is a biological reps. I also don't think PCA is a good measure for that question in this case.

You can look into correlation plots. I've also appreciated the idea of the SERE coefficient. https://link.springer.com/article/10.1186/1471-2164-13-524

Nevertheless, using a robust DE package is pretty much designed for these situations.

ADD REPLY • link 12 weeks ago by rfran010 ★ 1.6k

score 4 · Accepted Answer · 2025-06-24

Your PC1 explains 75% of variance, driven by the group separation, I would expect lots of DEGs. Rather than home-cooked approaches like bootstrapping I would use established statistics such as the limma package and prioritize genes with large logFCs, e.g. using the treat function, to test directly whether there is significance for changes beyond a threshold, e.g. 1.2 or 1.5. That is more robust than bootstrapping, and much more citable.

score 2 · Accepted Answer · 2025-06-24

2

Entering edit mode

12 weeks ago

swbarnes2 15k

You could try to figure out what PC2 represents. If it's something meaningful, like tissue contamination.

You could also include PC2 as an element in your design.

But ignoring it is simple, and valid.

ADD COMMENT • link 12 weeks ago by swbarnes2 15k