Different number of peaks between biological replicates?
3
0
Entering edit mode
4.5 years ago

Hi All

I did ChipSeq analysis using both Bowtie2 and BWA mem. The peaks were called taking IGG as control. After calling peaks I see one trend, Replicates 2 (marked by -2) don't have even half the number of peaks as compared to replicate 1 (marked by -1).

This is weird because biological replicates must have relatively equal number of peaks

Alignment with BWA
Sample No. of Peaks

  1. HTH3 87299
  2. HTK27AC-1 11196
  3. HTK27AC-2 428
  4. HTK27ME3-1 35341
  5. HTK27ME3-2 10286
  6. HTK4ME1-1 93845
  7. HTK4ME1-2 1420

Alignment with Bowtie2
Sample No. of peaks

  1. HTH3 17944
  2. HTK27AC-1 6259
  3. HTK27AC-2 465
  4. HTK27ME3-1 20044
  5. HTK27ME3-2 7773
  6. HTK4ME1-1 9600
  7. HTK4ME1-2 761

Is it normal to see this trend. I referred to the literature bit I see people mention more common peaks between biological replicates and at the same time relatively same no. of peaks.

sequencing ChIP-Seq replicates • 1.8k views
ADD COMMENT
4
Entering edit mode
4.5 years ago
ATpoint 82k

This is weird because biological replicates must have relatively equal number of peaks

No, they don't but probably should. Raw peak numbers are strongly influenced by immunoprecipitation efficiency, signal-to-noise ratio and sequencing depth. Not unusual to get quite different numbers between replicates. That is exactly why I personally find it pointless to compare raw peak numbers. The H3K27ac in my experience is especially problematic as it always (edit: in the datasets I've seen from primary specimen) gives rather poor quality data. H3K4me1 is typically better (=more specific in terms of signal/noise ratio). In your second replicate you might have had issues with crosslinking efficiency, cell viability, antibody coupling efficiency, lot of possible reasons. ChIP is always a problematic experiment as it is so antibody-dependent. I always roll my eyes when I see papers making statements like "condition 1 shows 30% more peaks than condition 2". Statements like this should be based on a proper differential analysis. That means merge all peaks, create a count matrix for all conditions and then feed this into tools like DESeq2 or edgeR. If then you see that you get significantly increased counts for one conditions over the other in a notable number of peaks you can make statements. If not, any fluctuation of peak numbers might be a function of IP efficiency, depth, or peaks might be small and spurious. Especially the latter is important if data quality (e.g. due to antibody efficiency) is an issue. A sample with slightly better quality might get more peaks at borderline significance while a sample with reduced quality might not. This is still not too informative about the actual biology. It only (in my very humble opinion) matters if these initial observations suvive a proper analysis that takes into account dispersion between replicates etc. after a meaningful normalization of read depth and library composition.

ADD COMMENT
0
Entering edit mode

Hi,

THe H3K27ac is my experience is especially problematic as it always gives rather poor quality data. H3K4me1 is typically better (=more specific in terms of signal/noise ratio).

Is there any reference for this statement? IMO, poor quality must depend on IP efficiency or antibody quality or general lib. preparation steps but not on what histone modifications are pulled down.

ADD REPLY
0
Entering edit mode

I cannot give any reference, but this is what I typically experience in multiple murine and human primary datasets I analyzed. I guess this is due to the antibody in combination with low(er) input material as samples I've seen this were always ex vivo from primary donors. In these low-input primary data H3K27ac FRiPs were typically 1-5%, with less than 10k callable peaks. In contrast we produced the same data from comparable cell lines with millions of cells as starting material giving FRiPs of 10-30%, so I cannot fully blame the antibody for it, probably a combination of low input, antibody quality, protocol, sequencing depth etc.

ADD REPLY
0
Entering edit mode

That means merge all peaks, create a count matrix for all conditions and then feed this into tools like DESeq2 or edgeR

What "all conditions" are you referring to? How can I make the count matrix? Please give some hint as I am new to the Chipseq analysis. Something like this ?

ADD REPLY
2
Entering edit mode
4.5 years ago

What if you merge the .bed files, and quantify counts within the peaks?

Do you get a high correlation in the quantifications?

In your case, I think there is some bigger difference (for HTH3), but that is what I would probably check if the peak counts were similar but in different positions (like for the other antibodies).

For ATAC-Seq data, I found it helpful to use the --local option to increase the alignment rate. You can also run Picard and get an idea if the insert distribution looks different with the different alignments.

However, for ATAC-Seq data, the alignment rate was very different for default BWA-MEM and Bowtie2. Is the alignment rate also different for your samples, or do you have a similar alignment rate and a different number of peaks?

Also, for Histone modifications, I used the HOMER findPeaks with -style histone . However, I don't think it changed things as much as you described (for HTH3).

Finally, is your total reads similar in your replicates? That can also have an effect on the number of peaks called (and ATpoint also mentions read depth and read count comparisons).

ADD COMMENT
1
Entering edit mode

I will check that and will let you know.

In my correlation analysis, however, I observed a high relation between HTH3 and the rest of the modifications. It was reasoned to me that since these modifications occur on the tails of the H3 histones, a high correlation between all three modifications and H3 is obvious. Also, the Peaks I get in case of each modification must be theoretically subset of H3 peaks.

ADD REPLY
0
Entering edit mode
4.5 years ago
colin.kern ★ 1.1k

Is this data from cell lines or frozen tissue? ChIP-seq can be a very hard assay to get consistent data from, especially with tissue, because there can be very large differences in the signal-to-noise ratio even when following the exact same protocol. Seeing that it is consistently the second replicate that has much lower peak numbers, it could be a difference in how well the chromatin fixation and shearing step worked, assuming the IPs for all the marks were done from aliquots of the same shearing product. Otherwise it may be the actual tissue sample is more degraded for replicate 2.

ADD COMMENT

Login before adding your answer.

Traffic: 2813 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6