Question: ChIP seq- input DNA control for normalization
gravatar for Z-F
4.4 years ago by
Z-F20 wrote:

Hi everyone. I am a new one in ChIP-seq and planning to do it for a poorly characterized TF. We are going to do ChIP on transfected HEK293 cells with WT and 2 other mutant forms of our specific gene which makes three samples to be sequenced. Regarding the negative control "Input DNA", the amount of samples for NGS would be 6 samples (1 sample+ its corresponding input DNA) which is quite expensive for us. As I am searching to find a way to reduce the cost, I was wondering that if its possible to find the sequencing RAW files of control DNA for HEK cell somewhere in databases and use it as a control to normalize the analysis. Is it possible? Are there such kind of data available and if so, can we use them instead of input DNA? As I understood the sonication is not a truly random process and we may have to sequence each samples's input DNA separately.Am I correct? If so, can we sequence only one of the input DNAs (for example WT) as the control for all the three samples or we have to consider one seperate input DNA for each sample?

I am looking forward to hearing from you.

chip-seq next-gen • 5.8k views
ADD COMMENTlink modified 2.6 years ago by Bogdan1000 • written 4.4 years ago by Z-F20

Typically, you take a sample of sonicated chromatin, split it in to X tubes, where X is the number of antibodies to try +1, then the +1 goes through all the same library prep stages except it was never bound to magnetic beads and washed. That is input. So you don't have an input per antibody! That would be quite expensive :P

ADD REPLYlink written 4.4 years ago by John12k

Dear John, Thanks for your reply. Actually, the problem is that I am going to sonicate three similar HEK293 cell lines which are all transfected with a same gene but in WT and mutant forms and for all of these I only use one antibody (anti-GFP). Can I still use one "input DNA'' for all 3 samples (3 different sonicated chromatins)?

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Z-F20

I see. hm. Well this isn't really a bioinformatics question, so i'm afraid the answer might leave a bitter taste in the mouth of my peers, but essentially the answer to your question depends on how "exploratory" this experiment is. The correct answer is "yes, every single sample of the 6 needs an Input control", since the input chromatin is different every time and you don't have biological or technical replicates. However, practically, this could be a waste of time and money..

I'm guessing you're using GFP because you previously did imaging work, found that it localizes in the nucleus, found that it also binds DNA, and now you want to see where it goes but you don't have the time/money to generate a ChIP antibody for the actual protein? (this is under the assumption that the TF was really poorly characterized as per your OP). This is already puts you into the realm of "not conclusive" if you wanted to publish this data, since sticking a huge molecule like GFP onto the protein changes how/where it can bind DNA. Furthermore, GFP-ChIP is notorious for having high background noise (if you tag the protein with something for GFP to bind to, rather than make a hybrid protein). Far better to spend money on buying/generating the antibody than the 6 Inputs. Or alternatively use some new commercial thing that will overcome the issues with the GFP. This would be a better use of your money than Input controls.

Finally, and i feel a bit naughty saying this, but you can always generate a ChIP'd library and an Input library from each of the 6 samples, then not sequence the Inputs unless it becomes necessary. Libraries can be stored for many many years, and an input costs essentially nothing other than your time to generate. You can pull it out of the freezer and sequence it if the reviewers ask for it. Really though, your WT sample is a better control than input anyway. At least, for an exploratory analysis for the effect of the mutation, and so long as you do the fixation and sonication on samples processed all on the same day with the same buffers, etc. Don't be using different fixatives or fixation durations. That's when your input control is most valuable. Don't be sonicating Line 1 on Monday, then Line 2 on Friday after Dave from Accounting has been fiddling with the machines power output to save a few cents. Keep it nice and consistent, and maybe, just maybe, you'll probably have to sequence the Input anyway ;-)

ADD REPLYlink written 4.4 years ago by John12k
gravatar for Michele Busby
4.4 years ago by
Michele Busby2.1k
United States
Michele Busby2.1k wrote:

Hi Z-F,

So controls are used to keep you from calling peaks where there are no peaks. This can happen because of read pile ups caused by things other than your protein binding. Things that can cause this are PCR duplications (solved by using paired end reads), copy number variations, and areas of open and close chromatin.

The pileups of open and closed chromatin will be big or small depending on how even your shearing is and what you use for size selection. There are some papers with huge bumps near transcription start sites. We do not always see that in our data and often people just use one control per genetic background to control for the CNV problems.

In your case, since you are looking at differences between your samples you might be able to get way with no control if you assume that the background is the same in all three cases. Then when you compare the samples you will essentially divide out the noise. However, a lot of the peak callers are optimized around the expectation of having a control.

There are plenty of existing controls for HEK293 available from ENCODE:

You could try using one of those, or merge all of them for a deep read coverage. I think that will get you the science. I don't know if it will get you a publication.

Reviewers might not agree as conventionally people use one control per sample because of the previously observed peaks in the control near transcription start sites.

The Oshlack lab did a paper saying you can use an H3 ChIP instead of WCE as your control. We have found that this is sometimes easier to get to work than WCE.


ADD COMMENTlink written 4.4 years ago by Michele Busby2.1k

Dear Michele,

Thanks for your comprehensive reply. It helped a lot.

ADD REPLYlink written 4.4 years ago by Z-F20

solved by using paired end reads

A true PCR duplicate will stay a duplicate, regardless of the sequencing mode.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by ATpoint36k

Yes, the PCR duplicates will be there but if you use paired end reads you can mark them and eliminate them from your analysis.

ADD REPLYlink written 2.6 years ago by Michele Busby2.1k
gravatar for jotan
4.4 years ago by
jotan1.2k wrote:

Michelle Busby has provided a good explanation and potential solution.

I would just add that I don't think it's a good idea to eliminate a negative control (input sequencing) for the sake of saving a few hundred dollars. You would be unnecessarily introducing an additional blind assumption, that the input sequencing for all your samples are identical, into your experimental design. You'd run the risk of having artefactual and uninterpretable results. Not to mention, potentially unpublishable results.

In all likelihood, the input sequencing will be very similar between all samples. But that's actually something that you want to see and really, something you need to test.

ADD COMMENTlink written 4.4 years ago by jotan1.2k

Dear jotan, Thanks, I think we should sequence the "input DNA" as all of you mentioned. Thanks for the reply.

ADD REPLYlink written 4.4 years ago by Z-F20
gravatar for Bogdan
2.6 years ago by
Palo Alto, CA, USA
Bogdan1000 wrote:

Dear all,

related to your conversation above, please may I ask : shall we have multiple samples from ChIP experiments (where the proteins A, B, C were immuno-precipitated on the chromatin) and INPUT DNA (from the samples A, B, and C), would we normalize them all together (i.e. ChIP + INPUT) in edgeR/csaw, for example ? Many thanks,


ADD COMMENTlink written 2.6 years ago by Bogdan1000

Are sample A, B, and C the same biological sample?

ADD REPLYlink written 2.6 years ago by Michele Busby2.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1056 users visited in the last hour