I was performing some tests using the peak calling software MACS2 and I stumbled upon an issue: using pair-ended reads as input, I get different results using the same data set, but in
BAMPE format and in
BEDPE format. If I use a data set consisting of single-end reads, I do get the same results using
BED. MACS2 was used calling:
/macs2/bin/macs2 callpeak -t ./input.bedpe -f BEDPE -g hs -n bedpe
/macs2/bin/macs2 callpeak -t ./input.bampe -f BAMPE -g hs -n bampe
respectively. The conversion from from BAMPE to BEDPE was performed using:
samtools sort -l 0 -n -O bam <input_bampe_file> | samtools view -bf 0x2 | bedtools bamtobed -bedpe -i - > <output_bedpe_file>
Note that the PCR duplicates were removed from the
BAMPE file. Using the
BEDPE format, I get a resulting peaks file of 67711 entries and using
BAMPE, I get 58330 entries. Am I doing something wrong in the conversion between the two formats or did anyone else experience this issue with MACS2? Thank you
Can you look at the .xls file MACS2 produces and see if the number of reads counted is different between your BAMPE and BEDPE files? My guess is that with the BEDPE format you're only using reads which are properly paired to measure fragment size / pileup. With the BAMPE format it's using only the properly-paired reads to measure fragment size, but using all reads to measure pileup.
Hi James. Thanks for the reply. Looking at the xls files generated during the runs:
Even if I convert from
samtools view -bf 0x2step which is supposed to convert only properly-paired reads according to the doc then I get 67720 entries in the resulting peak file from
BEDPEas compared to the previously 67711.
Hmm okay I'm not sure, maybe best to post an issue to the MACS GitHub repository (https://github.com/taoliu/MACS/issues) or the MACS Google group (https://groups.google.com/forum/#!forum/macs-announcement). Tao Liu will obviously know more about the inner workings.
I've sent an email to the support team earlier today, let's see if they answer.. thought it might be quicker via biostars as there are more ppl using MACS :]
OK. I've performed a bunch of tests, but still no decent conclusion has been reached. If I use single-end reads as input in MACS2, from both
BEDformats I get identical resulting peak files. If I use pair-ended reads in
BEDPEI get different resulting peak files using the same data set. Line count peak file:
BEDcreated from BAMPE: 90685
samtools sort -l 0 -n -O bam <input_bampe_file> | bedtools bamtobed -bedpe -i - > <output_bedpe_file>: 67720
samtools sort -l 0 -n -O bam <input_bampe_file> | samtools view -bf 0x2 | bedtools bamtobed -bedpe -i - > <output_bedpe_file>(which keeps only the properly-paired reads) : 67711
using not the standard
cut -f 1,2,6 <input_bedpe_file> | sort -k1,1 -k2,2n -k3,3n > <output_bed_file>: 60998
:/ reading through the doc of MACS2 still don't find it clear enough. They use a special BEDPE format that I used in the last test: "The BEDPE format is a simplified and more flexible BED format, which only contains the first three columns defining the chromosome name, left and right position of the fragment from Paired-end sequencing. Note, this is NOT the same format used by BEDTOOLS, and BEDTOOLS version of BEDPE is actually not in a standard BED format. " Any idea why?
did you ever got to the bottom of this problem? Kinda fighting same thing here. I did find this answer by taoliu https://github.com/taoliu/MACS/issues/183 where he points out that macs2 does read selection that can be simulated using: samtools view -b -f 2 -F 4 -F 8 -F 256 -F 512 -F 2048 How ever still I'm getting different results using BAMPE or BEDPE, I've found that macs estimates different fragment lengths for each file format.
Have you sorted this out? I ran into the same problem. MACS2 BAMPE and BEDPE gave dramatically different "mean fragment size" I currently just used BEDPE, as the fragment size in BAMPE seemed mistakenly small to me.