Question

Conversion Form Broad Peak Format Of Macs2 To Narrow Peak Format

1

Entering edit mode

11.2 years ago

Dataminer ★ 2.8k

Hi!

Is their an easier and simpler way (if their is ?) to compute the p-value and signal enrichment of the peaks present in broad peak file of macs2.

Thank you

chip-seq • 6.6k views

ADD COMMENT • link updated 11.2 years ago by Sukhi Singh 11k • written 11.2 years ago by Dataminer ★ 2.8k

score 5 · Answer 1 · 2013-02-21

According to Tao;

q-value is analogue to FDR. http://en.wikipedia.org/wiki/False_discovery_rate

Please also pay attention to the column name in MACS1 or 2 output. There are -logqvalue column or FDR(percentage) column.

Also. MACS1 and 2 use different methods to calculate FDR that I have explained in previous emails.

Also,

In MACSv1.4, FDR is calculated by swapping treatment and control. MACSv14 assumes all peaks called under a cutoff in this way are false positives. So it can calculate empirical FDR for each pvalue cutoff. However this method is hugely influenced by unbalanced sequencing depth. For example, if control sample is much larger than treatment, FDR would be overestimated so you would have less 'good' peaks above cutoff.

In MACSv2, I use another approach. First, pvalue are calculated at every basepair in the genome, then I adopt Benjamini-Hochberg to correct multiple comparisons, convert pvalue into qvalue or minimum FDR that the peak is significant. This method is more robust in my experience.

So, the interconversion will not give the same result as they are calculated differently plus these two tools are used for relatively different purposes and results are effected by the default values plus the read density of the samples in question. Best is to run the samples again with Macs14.

Sources

Update :

Just ran Macs14 and Macs2, with same sample and control file

Macs2 [-q 0.01]

sed 1,24d test_peaks.xls | head
chr    start    end    length    abs_summit    pileup    -log10(pvalue)    fold_enrichment    -log10(qvalue)
chr1    4770143    4770617    475    4770441    22.00    10.06    5.39    6.92
chr1    4774970    4775332    363    4774970    21.00    4.59    2.67    2.17
chr1    4785399    4785693    295    4785544    39.00    18.79    6.27    15.11
chr1    4858164    4858580    417    4858395    28.00    11.08    4.76    7.86
chr1    5018564    5018786    223    5018683    22.00    12.43    7.12    9.11
chr1    6214341    6214589    249    6214361    22.00    8.33    4.35    5.36
chr1    7088437    7088737    301    7088545    41.00    18.89    5.97    15.20
chr1    7124854    7125079    226    7125006    16.00    6.05    4.08    3.37
chr1    7129919    7130171    253    7130091    21.00    11.95    7.13    8.66

Macs14 [FDR cutoff=1.5]

head ../fdrFilterPeaks_1.5_mock_xx.bed 
chr    start    end    length    summit    tags    -10*log10(pvalue)    fold_enrichment    FDR(%)
chr1    4769731    4771497    1767    711    95    113.29    4.4    0.4
chr1    4773190    4775974    2785    1781    132    144.2    4.78    0.3
chr1    4776909    4778588    1680    1397    65    56.88    3.52    1.4
chr1    4784917    4786379    1463    629    93    115.63    6.72    0.38
chr1    4855079    4859029    3951    2726    257    78.67    3.04    0.71
chr1    4899973    4900844    872    476    32    71.88    6.35    0.88
chr1    4907855    4908681    827    421    29    62.32    4.76    1.16
chr1    5017615    5021374    3760    1071    170    315.75    6.97    0.23
chr1    5022679    5023934    1256    974    53    86.11    4    0.55

Though the cutoff vary a little, but still you can see, the size of peaks are really different implying the number of tags/pileup is different. If one would have to convert the file interchangeably, this would produce so much noise, that I wouldn't trust it in end. The physical location of peaks are also different, which is saying the protein is binding at relatively different places, which might/not be a significant problem.

By the way, why you would want to convert them.

Fisher test can be used to rank the peaks or by comparing two samples, you could tell few peaks appearing in a specific region are sample specific or universal or noise. What you gonna do with the p-value after applying the fisher test to the tag density of sample over control after the Macs2 run. Also, how you gonna get the tagDensity of control file from Macs2 plus its already kind of calculated in the fold_enrichment column.