Why should ATAC-seq mapped reads be shifted +4 and -5 for +strand and -strand, respectively
2
3
Entering edit mode
3.4 years ago
progistar ▴ 40

Hello,

I have started to analyze ATAC-seq data and I have a quick question about tag-shift process in data processing. In my best knowledge, there is 9-bp duplication created by DNA repair of the nick by Tn5 transposase.

To achieve base-pair resolution of TF footprint, all researchers do tag-shift processing (shift mapped read position by +4 and -5 for +strand and -strand, respectively); however, I do not understand why the +strand and -strand should be shifted by different size.

Is anyone answers my question?

Thanks for your effort!

ATAC-seq • 3.5k views
ADD COMMENT
4
Entering edit mode
3.4 years ago

The reason is the Tn5 binds as a dimer and includes a 9 bp spacer in between the two cut sites.

My guess is the decision is arbitrary, you are probably free to shift -4 and +5. Your peaks would still be in the same regions and I don't think anyone is doing single-nt resolution ATAC-seq analysis.

See the following (Figure 4): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3046479/

Here is another mention: "Previous descriptions of the Tn5 transposase show that the transposon binds as a dimer and inserts two adapters separated by 9 bps. " https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3959825/

I'm pretty sure the following is wrong but I included it anyway: "Lastly, reads should be shifted + 4 bp and − 5 bp for positive and negative strand respectively, to account for the 9-bp duplication created by DNA repair of the nick by Tn5 transposase and achieve base-pair resolution of TF footprint and motif-related analyses..."] https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1929-3

ADD COMMENT
0
Entering edit mode

To clarify, the goal of the shifting is to identify the center of the Tn5 dimer complex binding event.

ADD REPLY
0
Entering edit mode

Thanks for giving references. I agree with that single nt resolution does not matter of finding TF binding event.

ADD REPLY
0
Entering edit mode
6 months ago
suragnair • 0

Turns out +4/-4 is the correct shift. You can see that when you make a bigwig separately from the + and - strands, then the reads align with each other for +4/-4 and it's off by one for +4/-5. The main idea is that transposition events should be mapped to the same base pair regardless of strand.

ADD COMMENT
1
Entering edit mode

I think you are missing something fundamental about the molecular reaction underlying the index insertion. Each fragment is two separate cuts and I don’t think you are guaranteed to capture both ends in the sequencing.

ADD REPLY
0
Entering edit mode

Each fragment is two cuts (one on + and - strands), which you capture if you do paired-end sequence. Conversely, each cut gives rise to two fragments on either side (one + and -), and of course you may or may not capture both of them for the same cell. However if you're looking at a bulk sample with reads from many cells, and given that Tn5 has sharp sequence preferences, you would likely see a cut at the same genomic coordinate from different cells on different strands.

Regardless, you would like the same cut event to be mapped to the same base regardless of strandedness. If you see the above plots, you can see that alignment happens only for a specific shift (which happens to be +4/-4).

ADD REPLY
0
Entering edit mode

As I said above, it depends on how you count. IGV is 1-based, BED files are 0-based, so it depends on the coordinate system. On the molecular level of course it's the same exact position. The transposome does not know about + and - strand, it's a naming that we put on DNA, it's not a molecular aspect.

ADD REPLY
0
Entering edit mode

So long as one's pipeline is consistent throughout, the shift is +4/-4 regardless of 1-based or 0-based indexing. But you're right that shifting between 1 and 0-based for the same file can mix things up. +4/-4 seems to be the right shift for standard ATAC and scATAC pipelines though.

Agree with the molecular level event being the same. However, you'd like the same event to be mapped to the same base in the genome, regardless of + and - strand which is a computational feature and not a molecular aspect as you mentioned. That is only achieved at specific shifts as you can see in the image above.

ADD REPLY
0
Entering edit mode

That's an interesting observation. Can it be specific to the way you create the bigwig files?

It seems to me that the confusion comes from the fact that the cut sites are located between the nucleotides, but we need to relate them to nucleotide positions. So on the + strand the cut site is located before the first position, but on the - strand it's located after the last nucleotide position. The center of the transposition event on the other hand is located over a specific nucleotide. So keeping the discrepancy above in mind the +4/-5 shift makes sense and should produce the correct strand-independent coordinates for the center nucleotide.

ADD REPLY

Login before adding your answer.

Traffic: 2629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6