Bootstrapping in RNA-seq
2
0
Entering edit mode
12 weeks ago

Hello! I have a question about using bootstrapping in RNA-seq analysis.

I’m working with a dataset containing three biological replicates for each of two conditions. During EDA (PCA and heatmap clustering), I observed that the replicates in condition 1 cluster closely together and show consistent gene expression patterns, but in condition 2, the replicates are a bit more spread out, especially one of them.

enter image description here

That makes things tricky when trying to detect up- or downregulated genes between the two conditions, since the variability in condition 2 is pretty high. I don’t have technical replicates, so I can’t really dig deeper into what’s causing that variability, whether it’s biological or technical.

So my question is: would bootstrapping the biological replicates be useful here to explore that variability or to get a better sense of how stable the differential expression patterns are? I know it doesn’t create new data, but could it help highlight how sensitive the results are to which samples are included?

Curious if this is a reasonable approach, or not really recommended in this kind of setup.

bootstrapping RNAseq • 598 views
ADD COMMENT
1
Entering edit mode

I don't think I would really be concerned about this slightly greater variability between reps, especially if it is a biological reps. I also don't think PCA is a good measure for that question in this case.

You can look into correlation plots. I've also appreciated the idea of the SERE coefficient. https://link.springer.com/article/10.1186/1471-2164-13-524

Nevertheless, using a robust DE package is pretty much designed for these situations.

ADD REPLY
4
Entering edit mode
12 weeks ago
ATpoint 89k

Your PC1 explains 75% of variance, driven by the group separation, I would expect lots of DEGs. Rather than home-cooked approaches like bootstrapping I would use established statistics such as the limma package and prioritize genes with large logFCs, e.g. using the treat function, to test directly whether there is significance for changes beyond a threshold, e.g. 1.2 or 1.5. That is more robust than bootstrapping, and much more citable.

ADD COMMENT
2
Entering edit mode
12 weeks ago

You could try to figure out what PC2 represents. If it's something meaningful, like tissue contamination.

You could also include PC2 as an element in your design.

But ignoring it is simple, and valid.

ADD COMMENT

Login before adding your answer.

Traffic: 3209 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6