Question

Sample size estimation for differential ChIP-Seq

1

Entering edit mode

2.5 years ago

Russ ▴ 500

Hello all,

I am struggling to find references that would guide my experimental design. Specifically, I am trying to determine optimal sample size for a differential ChIP-seq experiment involving farm animals. I would like to investigate differential TF binding in affected vs. unaffected animals. Compounding variables may include age and/or farm differences. Breed will be the same.

The ENCODE consortium recommends a minimum of 2 replicates, but that seems to refer to a simple ChiP-seq experiment defining binding sites, rather than differential binding between two groups.

I have found the following articles, but find they don't address the issue directly:

Zuo C, Keleş S. A statistical framework for power calculations in ChIP-seq experiments. Bioinformatics. 2014;30(6):753-760. doi:10.1093/bioinformatics/btt200
Zhao, S., Li, CI., Guo, Y. et al. RnaSeqSampleSize: real data based sample size estimation for RNA sequencing. BMC Bioinformatics 19, 191 (2018). https://doi.org/10.1186/s12859-018-2191-5
Chung-I Li, David C Samuels, Ying-Yong Zhao, Yu Shyr, Yan Guo, Power and sample size calculations for high-throughput sequencing-based experiments, Briefings in Bioinformatics, Volume 19, Issue 6, November 2018, Pages 1247–1255, https://doi.org/10.1093/bib/bbx061

Any help is greatly appreciated.

Russ

ChiP-seq sample-size differential-expression • 1.1k views

ADD COMMENT • link updated 12 months ago by Ram 43k • written 2.5 years ago by Russ ▴ 500

score 1 · Answer 1 · 2021-10-20

1

Entering edit mode

2.5 years ago

Friederike 8.9k

It's a great question! I'm actually not aware of any robust benchmarking study that would have investigated that, but I think one can draw some insights from bulk RNA-seq since most packages for differential ChIP-seq analyses rely on the same methods. The general rule is that you should do twice as many replicates as you can afford (i.e. as many as possible). In real life, that typically translates to 3-5 replicates for bulk RNA-seq (You can read more about a systematic study of replicate numbers by Schurch and Gierlinski). If you can, keep all external variables such as age/sex/diet as constant as you can to reduce noise factors. It has also been shown over and over again that it's usually more useful for downstream analyses to invest in more replicates rather than greater sequencing depth. That being said, for ChIP-seq studies you should invest into decent sequencing depths, ideally aiming for 5-10x coverage for the input sample.

ADD COMMENT • link 2.5 years ago by Friederike 8.9k

0

Entering edit mode

Thanks for your quick answer, Friederike. Agreed re: minimizing confounding factors/noise, but the reality is that I will have to accept some variability in this potential experiment. Will try to source as many samples as possible. I will also likely stick to ENCODE's recommendation of 20 million reads/sample for ChIP (https://www.encodeproject.org/chip-seq/transcription_factor/)

ADD REPLY • link 2.5 years ago by Russ ▴ 500

1

Entering edit mode

20 mio reads after alignment though, so aiming for 50 mio is probably prudent Not sure how the size of your animal's genome compares to the mouse/human genome, but you def don't want the sequencing depth to be the limiting factor (cost-wise it's probably negligible compared to the cost of the full experiments)

ADD REPLY • link 2.5 years ago by Friederike 8.9k