1
2
Entering edit mode
5.2 years ago
Yu ▴ 110

Dear all,

I am confused with the --broad-cutoff option in macs2. In the manual, it said that --broad-cutoff is cutoff for broad region. This option is not available unless --broad is set. If -p is set, this is a pvalue cutoff, otherwise, it's a qvalue cutoff. DEFAULT: 0.1

If I understand it correctly, if I only set the --broad-cutoff 0.01, the cutoff will filter out broad peak with q-value > 0.01. However, when I processing a histone modification ChIP-seq data, I got some result with q-value > 0.01 [-log10(qvalue) <2].

Here is the result file.

    # This file is generated by MACS version 2.1.0.20150731
# Command line: callpeak -t /mouseAnalysis/data/align/CYJ-1_dedup.bam -c  /mouseAnalysis/data/align/CYJ-1i_dedup.bam -f BAM -n CYJ-1_CYJ-1i_macs.tmp -g mm -B --broad --broad-cutoff 0.01
--fix-bimodal --nomodel --extsize=270 --outdir  /mouseAnalysis/data/peakCall/
# ARGUMENTS LIST:
# name = CYJ-1_CYJ-1i_macs.tmp
# format = BAM
# ChIP-seq file = ['/mouseAnalysis/data/align/CYJ-1_dedup.bam']
# control file = ['/mouseAnalysis/data/align/CYJ-1i_dedup.bam']
# effective genome size = 1.87e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff for narrow/strong regions = 5.00e-02
# qvalue cutoff for broad/weak regions = 1.00e-02
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is on

# tag size is determined as 49 bps
# total tags in treatment: 8279763
# tags after filtering in treatment: 8279763
# maximum duplicate tags at the same position in treatment = 1
# Redundant rate in treatment: 0.00
# total tags in control: 9357359
# tags after filtering in control: 9357359
# maximum duplicate tags at the same position in control = 1
# Redundant rate in control: 0.00
# d = 270
chr     start   end     length  pileup  -log10(pvalue)  fold_enrichment -log10(qvalue)  name
chr11   3132147 3132916 770     46.55   5.11039 1.29032 1.91284 CYJ-1_CYJ-1i_macs.tmp_peak_1
chr11   3168653 3168940 288     39.21   4.37530 1.18763 1.48268 CYJ-1_CYJ-1i_macs.tmp_peak_2
chr11   3193021 3193581 561     46.44   5.97673 1.34145 2.72753 CYJ-1_CYJ-1i_macs.tmp_peak_3


My questions are:

Do I understand the --broad-cutoff correctly? Why the filter doesn't work?

Thanks.

ChIP-Seq MACS2 macs2 broadpeak broad-cutoff • 6.0k views
0
Entering edit mode

Hello, I am a little confused if we set q values already, why we need the --broad-cutoff to filter again, especially when you use q and cutoff both at 0.1? Thanks!

0
Entering edit mode

This question is 5 years old. If you have a question, consider opening a new one rather than trying to piggyback. Note that MACS has also been updated since this question was posted and parameters may have changed.

0
Entering edit mode

I think at that time, I want to filter out some broad peak from the result, that why I set up --broad-cutoff. It depends on what you want to get from the macs output.

1
Entering edit mode
5.2 years ago
ivivek_ngs ★ 5.1k

0
Entering edit mode

Thanks. Can you give me some example, how to set the q-value, p-value and broad-cutoff, then I can filter out -log10(qvalue) <2 in my result list？

0
Entering edit mode

The parameter to use is -p for p-value set and -q for FDR set. It depends on what you want to set. What I usually perform depends mostly on my data coverage. If you have low data coverage then you can go with relaxed cut-offs but using arbitrary is not what I would suggest. So you can try to run the analysis at different -q values (0.05,0.01) while keeping the --broad-cutoff as 0.1 or you can set the similar same value of q-value cut-off as the --broad-cut-off say 0.05 or 0.01. Once you have the peaks for all these runs you can simply count the number of peaks from each run to estimate the call outputs. The idea is to set same value of --broad-cutoff as that of -p or -q that you are setting for the enrichment process of calling a peak. You need to test a bit to get a much better idea rather than selecting on arbitrary cut-off. May be playing with different parameter values gives you some idea about the output. Then you will not need to again filter for -log10(p or q value). What you can still do is select on fold change as you do in RNA-Seq to get much stringent peaks.

0
Entering edit mode

I think you didn't get my point. My question is how to set -q -p or --broad-cutoff to filter out -log10(qvalue) <2 in my post example above. My post is ask why my result didn't filter out -log10(qvalue) <2, the options used are in the result file. Thanks for your support.

0
Entering edit mode

Ah ok, so how many peaks are there in the *.broadpeak file and broadpeak.xls. It should be same. Then can you show me if you have -log10(q value) in broad peak file less than 2. Actually I have in my experience seen that the default qvalue is sometimes not working unless you mention it while running. So I often put the value in the command line. So if you run the command line

callpeak -t /mouseAnalysis/data/align/CYJ-1_dedup.bam -c  /mouseAnalysis/data/align/CYJ-1i_dedup.bam -f BAM -n CYJ-1_CYJ-1i_macs.tmp -g mm -B --broad --broad-cutoff 0.01 -q 0.01 --fix-bimodal --nomodel --extsize=270 --outdir  /mouseAnalysis/data/peakCall/


What is the output in both .broadpeak and broadpeak.xls and see still you get values less than 2 in -log10(q value). The handle I found does not work if you do not mention the paramter -qvalue while running the command.

0
Entering edit mode

Thanks a lot ! Using q-value can filer out -log10(qvalue) <2 peaks successfully.