Why should ATAC-seq mapped reads be shifted +4 and -5 for +strand and -strand, respectively
2
3
Entering edit mode
3.0 years ago
progistar ▴ 40

Hello,

I have started to analyze ATAC-seq data and I have a quick question about tag-shift process in data processing. In my best knowledge, there is 9-bp duplication created by DNA repair of the nick by Tn5 transposase.

To achieve base-pair resolution of TF footprint, all researchers do tag-shift processing (shift mapped read position by +4 and -5 for +strand and -strand, respectively); however, I do not understand why the +strand and -strand should be shifted by different size.

Is anyone answers my question?

Thanks for your effort!

ATAC-seq • 2.9k views
ADD COMMENT
4
Entering edit mode
3.0 years ago

The reason is the Tn5 binds as a dimer and includes a 9 bp spacer in between the two cut sites.

My guess is the decision is arbitrary, you are probably free to shift -4 and +5. Your peaks would still be in the same regions and I don't think anyone is doing single-nt resolution ATAC-seq analysis.

See the following (Figure 4): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3046479/

Here is another mention: "Previous descriptions of the Tn5 transposase show that the transposon binds as a dimer and inserts two adapters separated by 9 bps. " https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3959825/

I'm pretty sure the following is wrong but I included it anyway: "Lastly, reads should be shifted + 4 bp and − 5 bp for positive and negative strand respectively, to account for the 9-bp duplication created by DNA repair of the nick by Tn5 transposase and achieve base-pair resolution of TF footprint and motif-related analyses..."] https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1929-3

ADD COMMENT
0
Entering edit mode

To clarify, the goal of the shifting is to identify the center of the Tn5 dimer complex binding event.

ADD REPLY
0
Entering edit mode

Thanks for giving references. I agree with that single nt resolution does not matter of finding TF binding event.

ADD REPLY
0
Entering edit mode
9 weeks ago
suragnair • 0

Turns out +4/-4 is the correct shift. You can see that when you make a bigwig separately from the + and - strands, then the reads align with each other for +4/-4 and it's off by one for +4/-5. The main idea is that transposition events should be mapped to the same base pair regardless of strand.

ADD COMMENT
1
Entering edit mode

I think you are missing something fundamental about the molecular reaction underlying the index insertion. Each fragment is two separate cuts and I don’t think you are guaranteed to capture both ends in the sequencing.

ADD REPLY
0
Entering edit mode

Each fragment is two cuts (one on + and - strands), which you capture if you do paired-end sequence. Conversely, each cut gives rise to two fragments on either side (one + and -), and of course you may or may not capture both of them for the same cell. However if you're looking at a bulk sample with reads from many cells, and given that Tn5 has sharp sequence preferences, you would likely see a cut at the same genomic coordinate from different cells on different strands.

Regardless, you would like the same cut event to be mapped to the same base regardless of strandedness. If you see the above plots, you can see that alignment happens only for a specific shift (which happens to be +4/-4).

ADD REPLY
0
Entering edit mode

As I said above, it depends on how you count. IGV is 1-based, BED files are 0-based, so it depends on the coordinate system. On the molecular level of course it's the same exact position. The transposome does not know about + and - strand, it's a naming that we put on DNA, it's not a molecular aspect.

ADD REPLY
0
Entering edit mode

So long as one's pipeline is consistent throughout, the shift is +4/-4 regardless of 1-based or 0-based indexing. But you're right that shifting between 1 and 0-based for the same file can mix things up. +4/-4 seems to be the right shift for standard ATAC and scATAC pipelines though.

Agree with the molecular level event being the same. However, you'd like the same event to be mapped to the same base in the genome, regardless of + and - strand which is a computational feature and not a molecular aspect as you mentioned. That is only achieved at specific shifts as you can see in the image above.

ADD REPLY

Login before adding your answer.

Traffic: 842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6