Question

MACS3 Parameter adjustments

0

Entering edit mode

11 weeks ago

María José ▴ 10

I’m diving into the use of MACS3 for analyzing paired-end sequencing data, especially focusing on telomeric regions. I’m exploring how variations in the keep-dup parameter impact peak detection and coverage assessment.

In my experiments, setting keep-dup=all retains all tags, while switching to keep-dup=1 drastically reduces the tag count from approximately 46 million to just over 10 million. This raises important questions about how to accurately evaluate coverage given the substantial drop in retained tags.

I’m also considering additional metrics beyond peak length and chromosome distribution, using the -B and --SPMR options for comparing BigWig files. Are these sufficient for this analysis and determine which modifications are better for the analysis?

Additionally, I’m contemplating the -f BAMPE option for more precise insert size estimation, given the paired-end nature of my data. However, the enrichment in telomeric regions, which may have high mapping quality (MQ = 0), makes me wonder about its effect on insert size accuracy. Should I continue with the default callpeaks options to keep the left mate (5’ end tag)?

Thank you!

human libraries illumina callpeak macs3 • 715 views

ADD COMMENT • link 11 weeks ago by María José ▴ 10

0

Entering edit mode

What's type of data is this? e.g. chip-seq?

That seems like a high duplication rate. Is this data enriched for telomere sequences?

-B and --SPMR are sufficient for generating signal files.

BAMPE format may make sense here, as it relies on where the reads are already mapped, so you wont' be incorporating additional information anyways, and should be at least as accurate as MACS3's method to estimate insert sizes with single end data (because that also relies on read alignments, but now lack the benefit of pairing).

BAMPE makes less sense if you're more interested in read ends, e.g. ATAC-seq, rather than the middle of fragments.

Overall, I'm thinking it would be good to perform parallel analysis with and without duplicates/multimappers to see if the downstream conclusions are fundamentally different.

ADD REPLY • link 11 weeks ago by rfran010 ★ 1.6k

0

Entering edit mode

thanks @rfran010, I'm working with DNA that has been probe-selected using biotinylated telomeric repeats, which are then captured with streptavidin-coated magnetic beads. Would it be beneficial to calculate the Fraction of Reads in Peaks (FRIP) in this context? If I use --keep-dup=1, in order to calculate FRIP accurately, did you recommend using Samtools markdup or Picard's MarkDuplicates to filter the original BAM file.

ADD REPLY • link 11 weeks ago by María José ▴ 10

0

Entering edit mode

If these have MAPQ of 0 then you are probably forced to convert to BEDPE format and go along with this file as (I guess) macs will ignore reads with MAPQ=0, does it? Not sure. Many tools do.

ADD REPLY • link 11 weeks ago by ATpoint 89k

0

Entering edit mode

I'm thinking ideally there would be UMIs. Since it's targeting an isolated, repetitive region, it's hard to determine if all the duplicates are from the repetitive, targeted nature of the assay, or, since general library complexity is lower, are they mostly introduced by PCR?

If you use samtools or MarkDuplicates, then you would choose --keep-dup all to avoid MACS3's duplicate finding mode. I'm not sure if one would perform better than the other. I tend to use MarkDuplicates.

You may also consider the --keep-dup auto option.

I still think running parallel analyses with and without duplicates would be good. It's hard to say if FRIP will be accurate or not since that would depend on if they are true duplicates or just fragments from the same region/with the same sequence.

Another consideration that could be helpful, if fragmentation of the library was below general read length, then marking duplicates could be more reliable as fragments/reads would have more random sizing and could better be distinguished as duplicates.

ADD REPLY • link 11 weeks ago by rfran010 ★ 1.6k

0

Entering edit mode

Thanks, I have used the options --keep-dup = all and --keep-dup = 1, without previously performing a filter process with samtools or Picard. Would you recommend the auto option over the all option. Thanks for the support and assistance.

ADD REPLY • link 11 weeks ago by María José ▴ 10