Diffbind or DESEQ2 Chipseq data
2
0
Entering edit mode
4 days ago
Irene • 0

Hi everyone, I have a methodological question regarding differential analysis of ChIP-seq data. From what I understand, the general workflow is to first define peak regions using MACS3, and then assess differential acetylation using DiffBind. In this case, DiffBind counts the reads using both the .broadPeak or .narrowPeak files predicted by MACS, as well as the .bam files containing the aligned reads, and performs the counting. DiffBind also uses the input files to establish a baseline level of acetylation.

Now, in my case, I have a list of regions for which I need to determine whether they are differentially acetylated. How could I approach this? For example, should I check whether my regions fall within the peaks predicted by MACS and then perform a differential analysis with DESeq2? Or could I extrapolate the results obtained from DiffBind and extract the predicted acetylation levels from DiffBind for my regions of interest? Do these regions need to be 100% overlapping?

In summary, how can I establish differential acetylation in regions that are not those predicted by MACS3? I mention this because I have seen some studies that take the ChIP-seq .bam data and perform DESeq2 directly, but that approach does not consider the input files or the MACS predicted peaks.

For example, in this image, the red area represents the differentially acetylated region returned by DiffBind. The green area is a promoter, the grey tracks correspond to the BAM files (coverage), and the blue ones are the peaks predicted by MACS. If I’m interested in whether the promoter is differentially acetylated, should I say yes, because it falls within the red region? And approximately how much? Should I take the value from DiffBind?

enter image description here

I’d really appreciate any suggestions, references, or opinions you can share on this — thank you very much for your help!

DESEQ2 Diffbind Chip-Seq • 388 views
ADD COMMENT
1
Entering edit mode
4 days ago
james.hawley ▴ 90

Yes, the general workflow you describe is right. Unlike RNA-seq reads that come from well-annotated genes with defined coordinates, ChIP-seq data comes from places in the genome that aren't necessarily previously defined. In data science language, RNA-seq has a predefined "feature space" whereas ChIP-seq does not. So you have to use a tool like MACS, first, to find where these peaks are, which then defines your feature space.

But, if you already have a predefined list of regions (based on some previous experiments or other biological hypothesis you're following), then you may not need to call peaks with MACS, first. You can just take that list, load it into R as a BED file, and supply it to DiffBind directly. Because you're interested in this promoter you mention, I'd recommend just using that region directly. There is some confounding and significance amplification that happens when you use ChIP-seq reads to call peaks, and then use that same data to look for differences between the peaks that you just called, so you can avoid that issue entirely by using the predefined region(s) of interest.

DiffBind uses DESeq2 (or edgeR) under the hood for its differential testing (see the "Details of DESeq2 analysis" section of DiffBind's documentation), so it doesn't particularly matter which one you use. DiffBind provides some wrappers and other functionality that make it easier to use DESeq2 with ChIP-seq data, instead of RNA-seq which DESeq2 was designed for.

ADD COMMENT
0
Entering edit mode

Thank you for your answers, i only have a doubt. Since my list of enhancers/or promoters is a BED file with three columns (chr, start, end), how can I use it directly with DiffBind, which requires a broadPeak file? How can I recover the signal so that DiffBind can use it? with dba.count(..., summits=500) parameter?

ADD REPLY
0
Entering edit mode

The broadPeak file format is basically a BED file with 9 columns. You can fill in the score and other columns with empty data of some kind to make it work with DiffBind.

ADD REPLY
1
Entering edit mode
4 days ago
LChart 5.1k

Now, in my case, I have a list of regions for which I need to determine whether they are differentially acetylated. How could I approach this? For example, should I check whether my regions fall within the peaks predicted by MACS and then perform a differential analysis with DESeq2? Or could I extrapolate the results obtained from DiffBind and extract the predicted acetylation levels from DiffBind for my regions of interest? Do these regions need to be 100% overlapping?

This depends on how you want to frame the analysis. In the promoter-centric approach, you would directly pass a set of promoters into DiffBind for testing, resulting in a differential acetylation top table, and you can then look up your promoters of interest. (You can do the same for gene bodies and enhancers - if you have those defined as well). In a hypothesis-free approach, you could use peak calling (or - my preference - csaw) to define differential regions, and use genebody/promoter/enhancer as annotations for each region.

Both approaches are valid. The "tunnel vision" associated with the promoter-centric approach is balanced out by the fact that the downstream analysis is very straightforward; while dealing with regions necessitates making ugly decisions about regions with multiple annotations, partial overlaps between features and regions, etc.

Given you have a pre-selected list of regions to test, it sounds like the first approach may be the most straightforward in your case.

ADD COMMENT

Login before adding your answer.

Traffic: 3110 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6