DiffBind issue in identification of differentially enriched peaks between two conditions
Entering edit mode
6 weeks ago
accibio ▴ 20

Hi. This is my first attempt at trying to analyze ChIP-Seq data. We are working on two proteins BAZ1A (aka ACF1) and SMARCA5 (aka SNF2H) which are subunits of the ISWI chromatin remodelling complex. BAZ1A recognizes acetylated lysines on histones whereas SMARCA5 is the ATPase subunit which interacts with BAZ1A and aids in chromatin remodelling, thereby controlling the activation/repression of genes. I found this paper where ChIP-Seq was performed on these two proteins in two conditons in mice - Control and Susceptible. My objectives are:

  1. To identify the binding sites of BAZ1A/SMARCA5 on one of their downstream target genes (reported previously) in both control and susceptible conditions. Since they are known to bind together I would expect similar binding sites.
  2. To show the differential binding of BAZ1A/SMARCA5 on its target gene in Control Vs Susceptible condition.

Control condition: 1 input sample, 3 replicates for BAZ1A and 3 replicates for SMARCA5.

Susceptible condition: 1 input sample, 3 replicates for BAZ1A and 3 replicates for SMARCA5.

For the ChIP-Seq data analysis I picked up some ideas from tutorial and tutorial and these are the steps that I followed. Please advise me if there's any step which needs correction/improvement.

  1. Quality check using FastQC. (Everything looks fine)
  2. Read alignment to mm10 reference genome using Bowtie2 in the local mode (--local). (Overall alignment rate of all samples ranges from 99.6% to 99.9%)
  3. Conversion of SAM to BAM and sorting BAM files using SAMtools.
  4. Removal of duplicates from BAM files using Picard.
  5. Removal of multi-mapped and unmapped reads using Sambamba.
  6. Filtering out reads aligning to blacklisted regions using bedtools.
  7. Strand cross-correlation assessment of blacklist filtered BAM files using Phantompeakqualtools (NSC > 1.05 and RSC > 0.8 obtained for all samples) and cumulative enrichment and clustering using deepTools.
  8. Peak calling for all replicates individually using epic2. (I found a tutorial suggesting depth normalization i.e. downsampling of reads to make the input and ChIP samples have equal number of reads but I skipped that step. At this point I have genomic coordinates of peaks obtained for each BAZ1A/SMARCA5 replicate in control and susceptible conditions as BED files)
  9. Peak quality assessment using ChIPQC. (RiP for all samples ranges from 3.9% to 14%)
  10. Merging BAM files of replicates using SAMtools and using the merged BAM files to make bigWig files, profile plots and heatmaps using deepTools. (From the profile plots, BAZ1A seems to have an enriched binding at the TSS of genes in susceptible condition whereas the enrichment is not too drastic in case of SMARCA5. This is in line with the results reported in the paper (Fig. 1 and Fig. 4))

    # Creating bigWig files
    bamCompare -b1 Con_BAZ1A.bam -b2 Con_Input.bam --scaleFactorsMethod None -o Con_BAZ1A.bw --binSize 10 --normalizeUsing BPM --smoothLength 50 --extendReads 150 --centerReads 2> Con_BAZ1A.log
    ...and so on for Con_SMARCA5, Sus_BAZ1A and Sus_SMARCA5         
    # Creating matrices
    computeMatrix reference-point --referencePoint TSS -b 2000 -a 2000 -R gencode.vM34.primary_assembly.annotation.bed -S *_BAZ1A.bw --skipZeros -o Matrix_BAZ1A.gz --outFileSortedRegions Regions_BAZ1A.bed
    ...and so on for SMARCA5
    # Making profile plots
    plotProfile -m Matrix_BAZ1A.gz -out BAZ1A_Profile.png --perGroup --colors blue red --samplesLabel "BAZ1A (Control)" "BAZ1A (Susceptible)" --plotTitle "BAZ1A read density" --refPointLabel "TSS" -z ""
    ...and so on for SMARCA5
    # Making heatmaps
    plotHeatmap -m Matrix_BAZ1A.gz -out BAZ1A_Profile_Heatmap.png --colorMap RdBu --zMin -0.005 --zMax 0.01
    plotHeatmap -m Matrix_SMARCA5.gz -out SMARCA5_Profile_Heatmap.png --colorMap RdBu --zMin 0.005 --zMax 0.02
  1. Differential enrichment of peaks in Control Vs Susceptible condition using DiffBind.

     samples <- dba(sampleSheet = "BAZ1A_samplesheet.csv",
                    minOverlap = 1)
     > samples
     6 Samples, 153192 sites in matrix:
                 ID Tissue Factor   Condition Replicate Intervals
     1 Con_BAZ1A_R1    NAc  BAZ1A     Control         1     32372
     2 Con_BAZ1A_R2    NAc  BAZ1A     Control         2     47734
     3 Con_BAZ1A_R3    NAc  BAZ1A     Control         3     34584
     4 Sus_BAZ1A_R1    NAc  BAZ1A Susceptible         1     53725
     5 Sus_BAZ1A_R2    NAc  BAZ1A Susceptible         2     44714
     6 Sus_BAZ1A_R3    NAc  BAZ1A Susceptible         3     47925
     counted <- dba.count(DBA = samples)
     counted <- dba.contrast(DBA = counted, 
                            categories = DBA_CONDITION)
     analysed <- dba.analyze(DBA = counted,
                            method = DBA_DESEQ2)
     report <- dba.report(DBA = analysed)
     report <- as.data.frame(report)
     > report[report$Fold > 0,]
     seqnames   start     end width strand     Conc Conc_Susceptible Conc_Control     Fold     p.value        FDR
     chr2 8978892 8979292   401      * 2.584173         3.584173            0 2.641511 0.000712058 0.04231885

This is where I'm currently stuck. The paper reports enhanced binding of BAZ1A in susceptible condition whereas I'm getting the opposite result. Out of 424 differential peaks identified, only 1 peak is positively enriched in the susceptible condition. I'm not able to figure out the reason. Am I missing something or did I make a mistake at any step? Please help.

DiffBind ChIP-Seq • 315 views
Entering edit mode

What's the MA plot look like? Would help indicate if there's a general shift or if signal just isn't varying that much between conditions. I am not really sold by that paper's claims - their supplemental figures where they should actual ChIP data seem pretty weak and cherry-picked to me.

And kudos for the detailed question, it makes it much easier to try to help people out. There's nothing you've done that sticks out to me as a glaring mistake. I typically don't remove duplicates from ChIP-seq, as it caps the signal range, but I highly doubt that's causing any issues here.

Entering edit mode

Thanks Jared. The MA plot looks like this.


Login before adding your answer.

Traffic: 1860 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6