Question: MACS2 produces different results using BAMPE and BEDPE
1
gravatar for Marius
2.3 years ago by
Marius20
Marius20 wrote:

Hi,

I was performing some tests using the peak calling software MACS2 and I stumbled upon an issue: using pair-ended reads as input, I get different results using the same data set, but in BAMPE format and in BEDPE format. If I use a data set consisting of single-end reads, I do get the same results using BAM and BED. MACS2 was used calling:

/macs2/bin/macs2 callpeak -t ./input.bedpe -f BEDPE -g hs -n bedpe

and

/macs2/bin/macs2 callpeak -t ./input.bampe -f BAMPE -g hs -n bampe

respectively. The conversion from from BAMPE to BEDPE was performed using:

samtools sort -l 0 -n -O bam <input_bampe_file> | samtools view -bf 0x2 | bedtools bamtobed -bedpe  -i - > <output_bedpe_file>

Note that the PCR duplicates were removed from the BAMPE file. Using the BEDPE format, I get a resulting peaks file of 67711 entries and using BAMPE, I get 58330 entries. Am I doing something wrong in the conversion between the two formats or did anyone else experience this issue with MACS2? Thank you

pair-ended chip-seq macs2 • 2.0k views
ADD COMMENTlink written 2.3 years ago by Marius20
1

Can you look at the .xls file MACS2 produces and see if the number of reads counted is different between your BAMPE and BEDPE files? My guess is that with the BEDPE format you're only using reads which are properly paired to measure fragment size / pileup. With the BAMPE format it's using only the properly-paired reads to measure fragment size, but using all reads to measure pileup.

ADD REPLYlink written 2.3 years ago by James Ashmore2.6k

Hi James. Thanks for the reply. Looking at the xls files generated during the runs:

For BEDPE:

# fragment size is determined as 100 bps
# total fragments in treatment: 29486407
# fragments after filtering in treatment: 19212868
# maximum duplicate fragments in treatment = 1
# Redundant rate in treatment: 0.35
# d = 100

For BAMPE:

# fragment size is determined as 264 bps
# total fragments in treatment: 31332236
# fragments after filtering in treatment: 23251700
# maximum duplicate fragments in treatment = 1
# Redundant rate in treatment: 0.26
# d = 264

Even if I convert from BAMPE to BEDPE without the samtools view -bf 0x2 step which is supposed to convert only properly-paired reads according to the doc then I get 67720 entries in the resulting peak file from BEDPE as compared to the previously 67711.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Marius20

Hmm okay I'm not sure, maybe best to post an issue to the MACS GitHub repository (https://github.com/taoliu/MACS/issues) or the MACS Google group (https://groups.google.com/forum/#!forum/macs-announcement). Tao Liu will obviously know more about the inner workings.

ADD REPLYlink written 2.3 years ago by James Ashmore2.6k

I've sent an email to the support team earlier today, let's see if they answer.. thought it might be quicker via biostars as there are more ppl using MACS :]

ADD REPLYlink written 2.3 years ago by Marius20

OK. I've performed a bunch of tests, but still no decent conclusion has been reached. If I use single-end reads as input in MACS2, from both BAM and BED formats I get identical resulting peak files. If I use pair-ended reads in BAMPE and BEDPE I get different resulting peak files using the same data set. Line count peak file:

using BAMPE: 58330

using "classic" BED created from BAMPE: 90685

using BEDPE created with samtools sort -l 0 -n -O bam <input_bampe_file> | bedtools bamtobed -bedpe -i - > <output_bedpe_file>: 67720

using BEDPE created with samtools sort -l 0 -n -O bam <input_bampe_file> | samtools view -bf 0x2 | bedtools bamtobed -bedpe -i - > <output_bedpe_file> (which keeps only the properly-paired reads) : 67711

using not the standard BEDPE format, but cut -f 1,2,6 <input_bedpe_file> | sort -k1,1 -k2,2n -k3,3n > <output_bed_file>: 60998

:/ reading through the doc of MACS2 still don't find it clear enough. They use a special BEDPE format that I used in the last test: "The BEDPE format is a simplified and more flexible BED format, which only contains the first three columns defining the chromosome name, left and right position of the fragment from Paired-end sequencing. Note, this is NOT the same format used by BEDTOOLS, and BEDTOOLS version of BEDPE is actually not in a standard BED format. " Any idea why?

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Marius20

Hi,

did you ever got to the bottom of this problem? Kinda fighting same thing here. I did find this answer by taoliu https://github.com/taoliu/MACS/issues/183 where he points out that macs2 does read selection that can be simulated using: samtools view -b -f 2 -F 4 -F 8 -F 256 -F 512 -F 2048 How ever still I'm getting different results using BAMPE or BEDPE, I've found that macs estimates different fragment lengths for each file format.

ADD REPLYlink written 2.1 years ago by mcjmigdal10

Hi,

Have you sorted this out? I ran into the same problem. MACS2 BAMPE and BEDPE gave dramatically different "mean fragment size" I currently just used BEDPE, as the fragment size in BAMPE seemed mistakenly small to me.

ADD REPLYlink written 21 months ago by Xinwei Han0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 939 users visited in the last hour