Question: Help understanding MACS2 --extsize and --shift
1
gravatar for datascientist28
2.5 years ago by
University of Washington
datascientist28360 wrote:

So I'm trying to understand the --shift and --extsize parameters in MACS2. I have inherited an ATAC-seq that used the following MACS2 command.

macs2 callpeak -t ATAC_sample-1.bam --nomodel --shift -100 --extsize 200 -g 1.5e9 -f BAM

I'm trying to see how I can apply these principles to a Rscript I have that converts my raw BAM file to a coverage file? The full script can be looked at here but the main lines I'm looking at are:

aln <- as(aln, "GRanges") # Converts each read mapping to GRanges coord 

aln <- resize(aln, 150) # Extended the reads to the fragment length (4 previous exp)

cov <- coverage(aln) # Get Coverages nucleotide

The definitions for extsize and shift can be seen in the documentation but I'll copy it for reference.

--extsize

While '--nomodel' is set, MACS uses this parameter to extend reads in 5'->3' direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when --nomodel is set or when MACS fails to build model and --fix-bimodal is on.

--shift

Note, this is NOT the legacy --shiftsize option which is replaced by --extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When --nomodel is set, MACS will use this value to move cutting ends (5') then apply --extsize from 5' to 3' direction to extend them to fragments. When this value is negative, ends will be moved toward 3'->5' direction, otherwise 5'->3' direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with --extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can't set values other than 0 if format is BAMPE or BEDPE for paired-end data. Default is 0.

Here are some examples for combining --shift and --extsize:

EXAMPLE 1: To find enriched cutting sites such as some DNAse-Seq datasets. In this case, all 5' ends of sequenced reads should be extended in both direction to smooth the pileup signals. If the wanted smoothing window is 200bps, then use '--nomodel --shift -100 --extsize 200'.

So to summarize my question is how should I resize my raw reads so that the coverage will reflect the --shift and --extsize used by MACS2? From reading the documentation it seems I should shift the reads 100 bp in both directions because it is extending the reads 200 bp 5' -> 3' (--extsze) and back 100 bp with (--shift).

atac-seq macs2 peak calling • 4.7k views
ADD COMMENTlink modified 18 months ago by xd_d90 • written 2.5 years ago by datascientist28360
4
gravatar for Devon Ryan
2.5 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

You'll need to endoapply() a function that does the shift() according to the strand, since that's otherwise ignored. Then resize(aln, 200). Alternatively, use idx = which(strand(aln) == '+') for each strand and directly adjust the start accordingly (then resize()).

ADD COMMENTlink written 2.5 years ago by Devon Ryan88k
2

I appreciate your help! The function it self seems easy. It's just herd for me to understand biologically how to write the function (if that makes sense). Based on what you are saying, are you saying that you --shift -100 means shifting read -100 from the start spot?

so lets say I have ranges

Chr1 4300 4352 +

Chr2 9720 9820 -

--shift -100 turns this into:

Chr1 4200 4252 +

Chr2 9820 9920 -

Is that correct?

Than --extsize 200 takes the output of that and extends the fragment size to 200

Chr 4200 4400 +

Chr 9720 9920 -

Is that right??

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by datascientist28360
1

What you showed is how I understand it happening as well :)

ADD REPLYlink written 2.5 years ago by Devon Ryan88k
0
gravatar for xd_d
18 months ago by
xd_d90
xd_d90 wrote:

Is the --extsize equal the window size ?

ADD COMMENTlink written 18 months ago by xd_d90

No it's the extension size that it will extend the reads to for smoothing the pileup

ADD REPLYlink written 18 months ago by datascientist28360
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 983 users visited in the last hour