Pooling Replicates in Omics Analysis: Trade-off Between Cost and Data Robustness?
3
0
Entering edit mode
11 days ago
sardius • 0

Hello,

I am planning a comparative analysis of transcriptomes or proteomes across 10 different cell lines. Originally, I intended to prepare 3 replicates for each cell line, resulting in a total of 30 samples. However, due to budget constraints, I am considering pooling the 3 replicates into a single sample per cell line, resulting in 10 pooled samples for analysis.

Would this pooling approach still yield interpretable and reliable data? Or, despite the increased cost, is analyzing all 30 individual replicates the only viable option for robust and meaningful results?

Thank you in advance for your advice.

Replicates Omics Pooling • 643 views
ADD COMMENT
2
Entering edit mode
6 days ago
eric.blalock ▴ 20

On the subject of DE, way back in the days of microarrays, this pooling/ sub-pooling issue was big. Gary Churchill warned everybody here https://www.nature.com/articles/ng1031z

we published a paper on sub-pooling strategies and their statistical consequences in transcriptional profiling- some of what we covered might still be useful- it is open access here (I think 13 people have read it, but why not 14?) https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-4-26

but this was early days, some later references are more convincing, e.g., https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0050986

Historical blah- It sounds like you are considering a "total" pooling strategy- I think this strategy is a train wreck for DE Stats. As with other posters, I do not recommend. Early array days (circa 2000-2007) had many 'n = 1' experiments (+/- pooling). At first, they were proof-of-concept by the inventors/vendors- but it established a precedent, and the arrays were quite expensive, so researchers began following suit (especially in Stanford arrays (two channel), but also in Affy systems). Vendors were happy to have the business and offered little pilot programs with a few arrays. The attitude was "it's just a screening tool" and approaches for selecting DEGs involved 1) use sub-local error (taking a subset of other RNA species measures from within X distance- as a percentage or a signal intensity range- of the measure in question to 'estimate' variance for the RNA species in question), and/or 2) set some unitary log 2 fold change criterion for all RNA species. But the former ended up being discredited for not representing biological variance of the RNA species being measured, the latter for: not having an assumed null distribution against which to compare the result, no estimate of variance, an assumption of central tendency, an assumption that different RNA species needed the same fold change level of change to be significant (since disproven- although L2FC is a critical component of analysis now, usually combined with p or q values in a volcano plot- not good on its own, though). In both approaches, the work had poor reproducibility, which started discrediting the technology. There was an outcry from the scientific community (the vendors started to get alarmed too). Lead journals began publishing experimental design guides, basically inferring that they wouldn't publish results based on designs that didn't meet these criteria. The advent of RNA-seq did not obviate the need for an experimental design that incorporates the standard principles of randomization, replication, and balance.

They say wisdom is learning from other peoples' mistakes. So, rather than go through that again, learn from us. Consider reducing scope- decrease the number of cell lines, but preserve n = 3 per cell line at minimum (is n = 3 "robust"? You should be able to tell which DEGs or DEPs are likely to be robust. It will be a narrower slice than you would get with a larger n, but you won't know if they are robust until someone replicates your work). IMHO, nothing is as wasteful as an underpowered experiment (I say that having done a few myself).

ADD COMMENT
1
Entering edit mode

Thanks for the historical blah! Happy to be reader #14.

Hopefully the past 20 years have demonstrated that a statement like "Figure 2 shows that approximately equivalent power to non-pooling can be achieved if the number of gene chips is reduced but the number of samples is increased." needs to be accompanied by "but never fewer than 2 chips per condition, below which power is 0 regardless of the number of pooled samples." (ha!)

ADD REPLY
1
Entering edit mode
10 days ago
ATpoint 88k

Without further details I would recommend against doing unreplicated experiments as replication is key for any meaningful pairwise statistics. Unreplicated experiments severely reduce your ability to perform meaningful inference. The fact that your single sample is the merge of several replicates itself does not matter here. If budget is limited then rather reconsider what the main scientific goal is, and try to cut the experiments down to make the actual readout robust and meaningful, rather than going too big and reducing quality.

ADD COMMENT
1
Entering edit mode
7 days ago

The answer to this entirely depends on what you intend to do with the data you obtain. Without replicates, you will be able to make inferences about differences between groups of cells lines, but not about indevidual cell lines.

If you wish to say "gene x is DE between cell line A and cell line B", then you must have replicates of those cell lines.

If you wish to say "gene Y is different between cell line from Cancer X and cell lines from Cancer Y", or if you wish to say "The transcriptomics and proteomics for Gene Z correlated across cell ines, but those for gene W don't", then your replicates would count as techincal replicates.

See my answer here: Replicates for RNA-seq from 1 cell line undergoing different treatments

ADD COMMENT

Login before adding your answer.

Traffic: 16028 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6