question about macs2 --broad-cutoff
1
3
Entering edit mode
8.1 years ago
Yu ▴ 140

Dear all,

I am confused with the --broad-cutoff option in macs2. In the manual, it said that --broad-cutoff is cutoff for broad region. This option is not available unless --broad is set. If -p is set, this is a pvalue cutoff, otherwise, it's a qvalue cutoff. DEFAULT: 0.1

If I understand it correctly, if I only set the --broad-cutoff 0.01, the cutoff will filter out broad peak with q-value > 0.01. However, when I processing a histone modification ChIP-seq data, I got some result with q-value > 0.01 [-log10(qvalue) <2].

Here is the result file.

    # This file is generated by MACS version 2.1.0.20150731
    # Command line: callpeak -t /mouseAnalysis/data/align/CYJ-1_dedup.bam -c  /mouseAnalysis/data/align/CYJ-1i_dedup.bam -f BAM -n CYJ-1_CYJ-1i_macs.tmp -g mm -B --broad --broad-cutoff 0.01
--fix-bimodal --nomodel --extsize=270 --outdir  /mouseAnalysis/data/peakCall/
    # ARGUMENTS LIST:
    # name = CYJ-1_CYJ-1i_macs.tmp
    # format = BAM
    # ChIP-seq file = ['/mouseAnalysis/data/align/CYJ-1_dedup.bam']
    # control file = ['/mouseAnalysis/data/align/CYJ-1i_dedup.bam']
    # effective genome size = 1.87e+09
    # band width = 300
    # model fold = [5, 50]
    # qvalue cutoff for narrow/strong regions = 5.00e-02
    # qvalue cutoff for broad/weak regions = 1.00e-02
    # Larger dataset will be scaled towards smaller dataset.
    # Range for calculating regional lambda is: 1000 bps and 10000 bps
    # Broad region calling is on

    # tag size is determined as 49 bps
    # total tags in treatment: 8279763
    # tags after filtering in treatment: 8279763
    # maximum duplicate tags at the same position in treatment = 1
    # Redundant rate in treatment: 0.00
    # total tags in control: 9357359
    # tags after filtering in control: 9357359
    # maximum duplicate tags at the same position in control = 1
    # Redundant rate in control: 0.00
    # d = 270
    chr     start   end     length  pileup  -log10(pvalue)  fold_enrichment -log10(qvalue)  name
    chr11   3132147 3132916 770     46.55   5.11039 1.29032 1.91284 CYJ-1_CYJ-1i_macs.tmp_peak_1
    chr11   3168653 3168940 288     39.21   4.37530 1.18763 1.48268 CYJ-1_CYJ-1i_macs.tmp_peak_2
    chr11   3193021 3193581 561     46.44   5.97673 1.34145 2.72753 CYJ-1_CYJ-1i_macs.tmp_peak_3

My questions are:

Do I understand the --broad-cutoff correctly? Why the filter doesn't work?

Thanks.

ChIP-Seq MACS2 macs2 broadpeak broad-cutoff • 11k views
ADD COMMENT
0
Entering edit mode

Hello, I am a little confused if we set q values already, why we need the --broad-cutoff to filter again, especially when you use q and cutoff both at 0.1? Thanks!

ADD REPLY
0
Entering edit mode

This question is 5 years old. If you have a question, consider opening a new one rather than trying to piggyback. Note that MACS has also been updated since this question was posted and parameters may have changed.

ADD REPLY
0
Entering edit mode

I think at that time, I want to filter out some broad peak from the result, that why I set up --broad-cutoff. It depends on what you want to get from the macs output.

ADD REPLY
1
Entering edit mode
8.1 years ago
ivivek_ngs ★ 5.2k

Yes you are correct the broad cut-off will not work unless you point out the exact pvalue or qvalue threshold. Unless you do that you will see all the reads for the broad-cutoff you set , you see a lot of data since your FDR cut-off is set at 10% and then the broad finding is applied on them. If you run both the narrow call and broad call with the q-value 10% and for broad you ad extra paramteer of --broad-cutoff 0.01 then you can fetch the length of the peaks and plot a histogram to understand the peak length distribution to understand how broad are the peaks. So to properly apply the --broad-cutoff you have to play with p-value or q-value score along with the broad-cutoff option. You can take a look here you can see for borad calls you need to set a stringent p or q value cut-off parameters along with broad-cutoff to get proper output. This is not very clearly mentioned in the document so you need to check it yourself using varied runs to see what you get in terms of the peaks.

ADD COMMENT
0
Entering edit mode

Thanks. Can you give me some example, how to set the q-value, p-value and broad-cutoff, then I can filter out -log10(qvalue) <2 in my result list?

ADD REPLY
0
Entering edit mode

The parameter to use is -p for p-value set and -q for FDR set. It depends on what you want to set. What I usually perform depends mostly on my data coverage. If you have low data coverage then you can go with relaxed cut-offs but using arbitrary is not what I would suggest. So you can try to run the analysis at different -q values (0.05,0.01) while keeping the --broad-cutoff as 0.1 or you can set the similar same value of q-value cut-off as the --broad-cut-off say 0.05 or 0.01. Once you have the peaks for all these runs you can simply count the number of peaks from each run to estimate the call outputs. The idea is to set same value of --broad-cutoff as that of -p or -q that you are setting for the enrichment process of calling a peak. You need to test a bit to get a much better idea rather than selecting on arbitrary cut-off. May be playing with different parameter values gives you some idea about the output. Then you will not need to again filter for -log10(p or q value). What you can still do is select on fold change as you do in RNA-Seq to get much stringent peaks.

ADD REPLY
0
Entering edit mode

I think you didn't get my point. My question is how to set -q -p or --broad-cutoff to filter out -log10(qvalue) <2 in my post example above. My post is ask why my result didn't filter out -log10(qvalue) <2, the options used are in the result file. Thanks for your support.

ADD REPLY
0
Entering edit mode

Ah ok, so how many peaks are there in the *.broadpeak file and broadpeak.xls. It should be same. Then can you show me if you have -log10(q value) in broad peak file less than 2. Actually I have in my experience seen that the default qvalue is sometimes not working unless you mention it while running. So I often put the value in the command line. So if you run the command line

callpeak -t /mouseAnalysis/data/align/CYJ-1_dedup.bam -c  /mouseAnalysis/data/align/CYJ-1i_dedup.bam -f BAM -n CYJ-1_CYJ-1i_macs.tmp -g mm -B --broad --broad-cutoff 0.01 -q 0.01 --fix-bimodal --nomodel --extsize=270 --outdir  /mouseAnalysis/data/peakCall/

What is the output in both .broadpeak and broadpeak.xls and see still you get values less than 2 in -log10(q value). The handle I found does not work if you do not mention the paramter -qvalue while running the command.

ADD REPLY
0
Entering edit mode

Thanks a lot ! Using q-value can filer out -log10(qvalue) <2 peaks successfully.

ADD REPLY

Login before adding your answer.

Traffic: 1997 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6