Question

Why should ATAC-seq mapped reads be shifted +4 and -5 for +strand and -strand, respectively

3

Entering edit mode

4.6 years ago

progistar ▴ 40

Hello,

I have started to analyze ATAC-seq data and I have a quick question about tag-shift process in data processing. In my best knowledge, there is 9-bp duplication created by DNA repair of the nick by Tn5 transposase.

To achieve base-pair resolution of TF footprint, all researchers do tag-shift processing (shift mapped read position by +4 and -5 for +strand and -strand, respectively); however, I do not understand why the +strand and -strand should be shifted by different size.

Is anyone answers my question?

Thanks for your effort!

ATAC-seq • 5.4k views

ADD COMMENT • link updated 15 months ago by Maksim ▴ 10 • written 4.6 years ago by progistar ▴ 40

score 4 · Answer 1 · 2020-12-02

The reason is the Tn5 binds as a dimer and includes a 9 bp spacer in between the two cut sites.

My guess is the decision is arbitrary, you are probably free to shift -4 and +5. Your peaks would still be in the same regions and I don't think anyone is doing single-nt resolution ATAC-seq analysis.

See the following (Figure 4): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3046479/

Here is another mention: "Previous descriptions of the Tn5 transposase show that the transposon binds as a dimer and inserts two adapters separated by 9 bps. " https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3959825/

I'm pretty sure the following is wrong but I included it anyway: "Lastly, reads should be shifted + 4 bp and − 5 bp for positive and negative strand respectively, to account for the 9-bp duplication created by DNA repair of the nick by Tn5 transposase and achieve base-pair resolution of TF footprint and motif-related analyses..."] https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1929-3

score 0 · Answer 2 · 2023-09-29

0

Entering edit mode

21 months ago

suragnair • 0

Turns out +4/-4 is the correct shift. You can see that when you make a bigwig separately from the + and - strands, then the reads align with each other for +4/-4 and it's off by one for +4/-5. The main idea is that transposition events should be mapped to the same base pair regardless of strand.

ADD COMMENT • link 21 months ago by suragnair • 0

1

Entering edit mode

I think you are missing something fundamental about the molecular reaction underlying the index insertion. Each fragment is two separate cuts and I don’t think you are guaranteed to capture both ends in the sequencing.

ADD REPLY • link 21 months ago by benformatics 4.1k

0

Entering edit mode

Each fragment is two cuts (one on + and - strands), which you capture if you do paired-end sequence. Conversely, each cut gives rise to two fragments on either side (one + and -), and of course you may or may not capture both of them for the same cell. However if you're looking at a bulk sample with reads from many cells, and given that Tn5 has sharp sequence preferences, you would likely see a cut at the same genomic coordinate from different cells on different strands.

Regardless, you would like the same cut event to be mapped to the same base regardless of strandedness. If you see the above plots, you can see that alignment happens only for a specific shift (which happens to be +4/-4).

ADD REPLY • link 21 months ago by suragnair • 0

1

Entering edit mode

That's an interesting observation. Can it be specific to the way you create the bigwig files?

It seems to me that the confusion comes from the fact that the cut sites are located between the nucleotides, but we need to relate them to nucleotide positions. So on the + strand the cut site is located before the first position, but on the - strand it's located after the last nucleotide position. The center of the transposition event on the other hand is located over a specific nucleotide. So keeping the discrepancy above in mind the +4/-5 shift makes sense and should produce the correct strand-independent coordinates for the center nucleotide.

ADD REPLY • link 15 months ago by Maksim ▴ 10

0

Entering edit mode

As I said above, it depends on how you count. IGV is 1-based, BED files are 0-based, so it depends on the coordinate system. On the molecular level of course it's the same exact position. The transposome does not know about + and - strand, it's a naming that we put on DNA, it's not a molecular aspect.

ADD REPLY • link 21 months ago by ATpoint 88k

0

Entering edit mode

So long as one's pipeline is consistent throughout, the shift is +4/-4 regardless of 1-based or 0-based indexing. But you're right that shifting between 1 and 0-based for the same file can mix things up. +4/-4 seems to be the right shift for standard ATAC and scATAC pipelines though.

Agree with the molecular level event being the same. However, you'd like the same event to be mapped to the same base in the genome, regardless of + and - strand which is a computational feature and not a molecular aspect as you mentioned. That is only achieved at specific shifts as you can see in the image above.

ADD REPLY • link 21 months ago by suragnair • 0