Question

ChIPseq - Identical peaks in IP and input samples

0

Entering edit mode

7.5 years ago

fraseriainyoung • 0

I am new to NGS technologies but have been producing and analysing some RNA-seq and ChIP-seq data recently with a view to integrating the two datasets. I have found these forums very useful in providing tips and advice so was hoping that someone might be able to help me with some issues I have experienced. Whilst the RNAseq analysis has gone very well, the ChIP-seq is proving a lot more problematic and I think it would be best to get a second opinion before I discard the data as being junk.

My experimental outline is as follows. Libraries were prepared for two biological replicates (IP and input control) and 75-mer paired end sequencing performed on an Illumina HiSeq 4000 platform. Reads were aligned to mm10 reference genome using BWA, and MACS-2 used to call peaks. From this I retrieved only a very small amount of peaks (~250) for each sample. My major concern is when I view the alignment files using IGV, the IP and input tracks are identical. I would have expected to find wide genomic coverage with a near flat baseline for my input, and more sparsely distributed distinct peaks for my IP samples. Instead, I have identical strong peaks for all 4 samples. The algorithms of MACS-2 do identify a significant enrichment of some of these peaks in the IP samples but I cannot regard these as true binding sites as there are matching peaks in the input controls. No other peaks were identified in my IP samples that could not be found in the input. I, therefore, have a couple of questions

1) What might be the cause of these specific sharp peaks in both input and IP samples? Areas of open chromatin?

2) Is it likely that the antibody used (custom-made) is non-specifically pulling down sonicated DNA? Or could the antibody not be precipitating any DNA at all and I have only sequenced input DNA non-specifically bound to the agarose beads for my IP samples?

3) Finally, I have noticed that only 48-49% of the total reads (for all input and IP samples) have been aligned by BWA. Could some of the true binding sites be hidden within these unmatched sequences and would it therefore, be worth going back to try and improve the alignment using less stringent mismatch criteria/another alignment tool? Or is this simply clutching at straws and the data is just junk?

Any suggestions would be kindly appreciated.

ChIP-Seq • 7.8k views

ADD COMMENT • link updated 7.5 years ago by colin.kern ★ 1.1k • written 7.5 years ago by fraseriainyoung • 0

0

Entering edit mode

Hello fraseriainyoung!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=72250

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 7.5 years ago by WouterDeCoster 47k

0

Entering edit mode

Some ideas :

Are there know binding sites that you could use to assess the quality of your IP ?
Have you checked the correlation between replicates IP and input ? With bamcorrelate for instance.
Do the peaks occurs at specific positions, for instance, gene TSS ?
Have you checked the fatqc profiles of your IP and input data ?

ADD REPLY • link 7.5 years ago by Carlo Yague 8.6k

0

Entering edit mode

Thanks Carlo.

Unfortunately, there are no known binding sites to compare my data against. I am going to check correlation between the samples with DeepTools. The peaks are randomly distributed with the majority featuring in intergenic regions. The fastQC profiles for each of the samples, while not perfect, were about as good as you can expect for the low starting amount of DNA from a ChIP. The only warning flashed up was over PCR duplication levels.

ADD REPLY • link 7.5 years ago by fraseriainyoung • 0

0

Entering edit mode

Are you using a Clontech kit by any chance?

ADD REPLY • link 7.5 years ago by Ryan Dale 5.0k

score 0 · Answer 1 · 2016-10-27

0

Entering edit mode

7.5 years ago

Marge ▴ 320

Hello,

It's quite funny: I am working on a ChIP-seq dataset that seems to share the odd appearance with yours. I have been on these data for a while now, I still don't have The Correct Answers but I can definitely share what I think.

One question a-priori would be: what are you trying to ChIP? Is it a common protein for which a lot of datasets are already produced (maybe in a different system)? If yes then I would try to apply the analysis approach to an alternative dataset: it will tell you if your data is "different" or what you see is just a characteristic of what you are precipitating.

I think regions of very highly enriched signal present in both ChIPped sample and input are most likely just artefacts (something like genomic regions that have high propensity to being sequenced). There are some genomic regions that are already known to have this problem (blacklisted regions), you can find more info here https://sites.google.com/site/anshulkundaje/projects/blacklists). Recommendation is to remove them from your output
In principle before doing the experiment you should have tested the efficiency of your Ab and the actual enrichment doing a normal IP.
Having less than 50% of the reads aligned sounds on the low side to me. Are you referring to unique mappers or mapped in general? In any case this could simply be related to the biological features of what you are ChIPping (e.g. if you are precipitating something that binds repeats, then low mapping could be kind of expected with standard parameters).

ADD COMMENT • link 7.5 years ago by Marge ▴ 320

0

Entering edit mode

Thanks for the helpful reply Marge. Unfortunately, the transcription factor that I ChIP-ed for is not well studied so there is very little preliminary data to compare against.

I am aware of the blacklisted regions and while I did not remove these from my analysis, I overlaid these with my sets of sequence tracks and only a handful of the peaks fall under a blacklisted area and so I suspect there is something else at play here.
I did do some ChIP-qPCR prior to sending samples for sequencing. Obviously, without any previously published/confirmed binding sites for my TF of interest this is like trying to find the needle in the proverbial haystack. From the qPCR, I did see some enrichment of predicted binding sites in the ChIP samples relative to isotype control IP and so I was hoping the ChIP-seq might confirm these as well as identifying new sites. Unfortunately, with the benefit of hindsight, I think what the qPCR may have shown was simply increased non-specific chromatin pull down in the antibody sample when compared to an isotype antibody.
I was referring to uniquely mapped sequences and so this is something that I am really looking to improve to see if it can make some of the data less murky.

ADD REPLY • link 7.5 years ago by fraseriainyoung • 0

0

Entering edit mode

Thank you for the additional explanations!

Since you are talking about a transcription factor I would tend to think there was some problem with enriching. An alternative option is that in the experimental conditions you are investigating there is no binding. Or maybe there is cell-specific binding and therefore what you see when sequencing a sample that is a pool of cells it looks like a mess (OK, I had too much time to think about possible explanations).

In our case we were ChIPping a chromatin modification for which it's reasonable to expect very diffused signal. We ended up using epic (diffuse domain ChIP-Seq caller based on SICER) instead of MACS2, which gave reasonable results. However if you expect a sharp binding profile (as is the case for transcription factors in general, at least at the best of my knowledge) this won't help.

This is a post I came across while trying to understand my own data, maybe there is something relevant to you.

ADD REPLY • link 7.5 years ago by Marge ▴ 320

score 0 · Answer 2 · 2016-10-27

0

Entering edit mode

7.5 years ago

colin.kern ★ 1.1k

You shouldn't expect your input to be flat. Due to various factors with the physical structure of the genome, you'll get more fragments in some places which is independent of your IP. By getting a set of control reads which are sheared but not IPed, peak calling programs can use it as background to distinguish these "phantom peaks" from real ChIP peaks.

I'd guess that your IP isn't working. Have you used phantompeakqualtools to calculate NSC and RSC scores? You can also check out DeepTools which has a lot of useful functions for analyzing and visualizing your sequencing data.

ADD COMMENT • link 7.5 years ago by colin.kern ★ 1.1k

0

Entering edit mode

Thanks for the tools Colin. I'm going to try running my sequences through these to get some more useful QC data.

ADD REPLY • link 7.5 years ago by fraseriainyoung • 0