noisy chip-seq peaks
1
0
Entering edit mode
7 months ago

Hi all,

I am trying to understand why my peaks look to be noisy as compared to some references from ENCODE. Any thoughts from your side would be highly appreciated.

I have a group of treated (n = 7) vs untreated (n = 7) samples that were precipitated by H3K4me3 antibody and sequenced on an Illumina HiSeq machine. I used bowtie to map pair-end reads to the hg19 genome and using samtools I created .bam files. For peak calling I used MACS2 . To make bigwig files I used both deeptools and UCSC bedGraphToBigWig .

# macs2 
# ___peak calling
macs2 callpeak -t sample1.bam  \
-c  input_sample1.bam input_sample3.bam input_sample7.bam \ #since there are no input per samples, all inputs included  
 -f BAMPE \
-g hs --SPMR --keep-dup auto --outdir ./outputs -n sample1_macs2_call  \
-B -q 0.01 --trackline --nomodel  --extsize 147 

#___ converting bdg to bigwig
macs2 bdgcmp -t sample1_treat_pileup.bdg \
-c sample1_control_lambda.bdg \
-m FE \
-o sample1_FE.bdg 

#_ and 
macs2 bdgcmp -t sample1_treat_pileup.bdg \
-c sample1_control_lambda.bdg \
-m logLR -p 0.00001 \
-o sample1_logLR.bdg 

#_finally
sort -k1,1 -k2,2n sample1_FE.bdg > sample1_FE_sorted.bdg
sort -k1,1 -k2,2n sample1_logLR.bdg  > sample1_logLR_sorted.bdg
bedGraphToBigWig sample1_FE_sorted.bdg hg19.sizes sample1_FE.bw
bedGraphToBigWig sample1_logLR_sorted.bdg hg19.sizes sample1_logLR.bw

#___ making bigwig by deeptools 

bamCoverage -b sample1.bam \
-o sample1.bw \
--binSize 20 \
--normalizeUsing BPM \
--smoothLength 60 \
--extendReads 150 \
--centerReads \
-p 12 

Here is a screenshot from IGV to show how the peaks look like. For the sake of comparison, two samples one from ENCODE and another from GSE120339 were added. As you can see there are several peaks(noise?) in the first three tracks. I indicated some of them by the red arrows. Also regarding on-target peaks (here those overlapped with promoters) peaks in the first three tracks (my sample) are look different than the control track (last two tracks).

enter image description here

Check this link to see a bigger version of the image.

Please let me know if I am doing something wrong... Thanks

Koli

deeptools macs2 chip-seq peaks igv • 919 views
ADD COMMENT
2
Entering edit mode

It might be helpful to auto scale each track so that the max height is set to the highest point in the signal. This would help to identify whether those peaks are "real" or not.

ADD REPLY
0
Entering edit mode

Thanks Jared, setting to auto-scale changed the peak appearance to look much better -especially on-target peaks- BUT some of those off-target peaks(noise?) remained unchanged and I can see them like vertical blue bars (as shown in the screenshot). I was thinking about some issues in upstream procedures like non-specific antibody binding. Do you think this kind of issue can lead to having those long vertical bars (peaks?/noise?).

ADD REPLY
1
Entering edit mode

Absolutely, ChIP can, depending on the antibody quality (specificity), the number of cells used for the experiment and the abundance of the protein target have a wide range of signal-to-noise ratio. This is why a good peak caller is key which takes into account an input (or IgG) control (if you have it) and/or the signal in the vicinity of candidate peaks. macs does that, see the paper for details on how that works (local lambda method). Unspecific binding can come from the background binding preference of the antibody, that is why people often include an IgG control, so an antibody without a specific target which simply binds unspecifically based on its isotype. You will often see spurious peaks in IgG that are somewhat similar to the actual ChIP sample (but much smaller) as ChIP peaks are often in open chromatin which are abundant in all kinds of proteins so like an "attractive" target for unspecific IgG binding. Also you can have just random DNA pieces that made it into the library prep, e.g. DNA that was sticking unspecifically to the beads one used for antibody pulldown. Lots of sources of noise in ChIP (or any (NGS) assay).

ADD REPLY
0
Entering edit mode

You can also utilize the ENCODE blacklist to ignore/remove regions that are known to be artifacts.

ADD REPLY
2
Entering edit mode
7 months ago
seidel 8.3k

Based on your picture, I'd say it's a matter of scale. Two of your tracks have red arrows pointing to peaks on a scale 1000-fold less than the HeLa track, and your other track is on a scale 20 fold smaller than the Encode sample. If you look at the control tracks on a scale of 0-1 you will likely see all kinds of "noise". As @jared.andrews07 suggests, let IGV autoscale your tracks, or at least put your tracks on a more comparable scale for comparison to your controls. Your scale is so small, I wouldn't be surprised if your red arrows were pointing to individual reads. Did MACS return peaks? Did MACS return peaks under your red arrows? If MACS did return peaks for your data set, examine the distribution of q-values and visually compare those with low versus high q-values to see if you can get a better definition of noise. (remember that MACS transforms the p and q values with -log).

"As you can see there are several peaks(noise?) in the first three tracks."

Are you calling these peaks? Or is MACS calling these peaks? Load the BED file into IGV, and explicitly examine what MACS is calling a peak, and what it thinks is a strong peak versus a weak peak.

ADD COMMENT
0
Entering edit mode

Thanks Seidel, really helpful points. Using the auto-scale option, visualizing bigwig file, adding narrowPeak files, and also adding customized bed files [one bed for those peaks with q value lower than 5, and one bed file for those peaks with q value greater than 10) to IGV helped me to obtain a deeper insight about the peak calls.

ADD REPLY

Login before adding your answer.

Traffic: 1524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6