Sample size estimation for differential ChIP-Seq
1
1
Entering edit mode
6 weeks ago
Russ ▴ 480

Hello all,

I am struggling to find references that would guide my experimental design. Specifically, I am trying to determine optimal sample size for a differential ChIP-seq experiment involving farm animals. I would like to investigate differential TF binding in affected vs. unaffected animals. Compounding variables may include age and/or farm differences. Breed will be the same.

The ENCODE consortium recommends a minimum of 2 replicates, but that seems to refer to a simple ChiP-seq experiment defining binding sites, rather than differential binding between two groups.

I have found the following articles, but find they don't address the issue directly:

  • Zuo C, Keleş S. A statistical framework for power calculations in ChIP-seq experiments. Bioinformatics. 2014;30(6):753-760. doi:10.1093/bioinformatics/btt200
  • Zhao, S., Li, CI., Guo, Y. et al. RnaSeqSampleSize: real data based sample size estimation for RNA sequencing. BMC Bioinformatics 19, 191 (2018). https://doi.org/10.1186/s12859-018-2191-5

  • Chung-I Li, David C Samuels, Ying-Yong Zhao, Yu Shyr, Yan Guo, Power and sample size calculations for high-throughput sequencing-based experiments, Briefings in Bioinformatics, Volume 19, Issue 6, November 2018, Pages 1247–1255, https://doi.org/10.1093/bib/bbx061

Any help is greatly appreciated.

Russ

size ChiP-seq differential expression sample • 229 views
ADD COMMENT
1
Entering edit mode
6 weeks ago

It's a great question! I'm actually not aware of any robust benchmarking study that would have investigated that, but I think one can draw some insights from bulk RNA-seq since most packages for differential ChIP-seq analyses rely on the same methods. The general rule is that you should do twice as many replicates as you can afford (i.e. as many as possible). In real life, that typically translates to 3-5 replicates for bulk RNA-seq (You can read more about a systematic study of replicate numbers by Schurch and Gierlinski). If you can, keep all external variables such as age/sex/diet as constant as you can to reduce noise factors. It has also been shown over and over again that it's usually more useful for downstream analyses to invest in more replicates rather than greater sequencing depth. That being said, for ChIP-seq studies you should invest into decent sequencing depths, ideally aiming for 5-10x coverage for the input sample.

ADD COMMENT
0
Entering edit mode

Thanks for your quick answer, Friederike. Agreed re: minimizing confounding factors/noise, but the reality is that I will have to accept some variability in this potential experiment. Will try to source as many samples as possible. I will also likely stick to ENCODE's recommendation of 20 million reads/sample for ChIP (https://www.encodeproject.org/chip-seq/transcription_factor/)

ADD REPLY
1
Entering edit mode

20 mio reads after alignment though, so aiming for 50 mio is probably prudent Not sure how the size of your animal's genome compares to the mouse/human genome, but you def don't want the sequencing depth to be the limiting factor (cost-wise it's probably negligible compared to the cost of the full experiments)

ADD REPLY

Login before adding your answer.

Traffic: 1228 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6