I saw having the mapped reads have +4 and -5 shift in ATAC-seq is a common practice.
Some place says "reads should be shifted + 4 bp and − 5 bp for positive and negative strand respectively, to account for the 9-bp duplication created by DNA repair of the nick by Tn5 transposase and achieve base-pair resolution of TF footprint and motif-related analyses"
Some place says:" When the Tn5 transposase cuts open chromatin
regions, it introduces two cuts that are separated by 9 bp. Therefore, ATAC-seq reads
aligning to the positive and negative strands need to be adjusted by +4 bp and -5 bp
respectively to represent the center of the transposase binding site."
I'm a little bit confused. Are shifting mainly to center the peak or avoid the duplication?
Does anyone have a good illustration on this? What will happen to the peak calls if this step is skipped?
I personally ignore the shifts unless I am plotting cutting events around TF motifs (footprint plots). For peak calling you can savely ignore it. Peaks are in the range of several hundred bp, I do not see how this little shift would impact it.
The shifting isn't for any real purpose unless you want to plot the exact cut location (e.g., when searching for motifs), it simply harkens back to one of the first ATAC-seq papers where they performed this adjustment to account for the 9-base single-stranded over-hang on each end of the fragment. Papers since have simply followed suite. A vastly more sensible strategy would be to use the 9 bases on each end of the fragment, since these are bases that are necessarily open.
Turns out +4/-4 is the correct shift. You can see that when you make a bigwig separately from the + and - strands, then the reads align with each other for +4/-4 and it's off by one for +4/-5. The main idea is that transposition events should be mapped to the same base pair regardless of strand.
I personally ignore the shifts unless I am plotting cutting events around TF motifs (footprint plots). For peak calling you can savely ignore it. Peaks are in the range of several hundred bp, I do not see how this little shift would impact it.
The paper _Transposition of native chromatin for multimodal regulatory analysis and personal epigenomics_ said:
But why peak calling and footprinting need to represent the center of transposon binding event I am still confusing.