Question

Why do peak shifts occur?

2

Entering edit mode

6.2 years ago

anikethsuresh ▴ 20

Why do positive and negative tags after tag generation, shift peaks to the middle of the two tag location?As in tag

Peak Calling • 4.0k views

ADD COMMENT • link updated 6.2 years ago by Friederike 8.9k • written 6.2 years ago by anikethsuresh ▴ 20

0

Entering edit mode

Basically, why do the tags extend in the direction of the other polarity tags? And why do they even have to be shifted there?

ADD REPLY • link 6.2 years ago by anikethsuresh ▴ 20

0

Entering edit mode

It is not clear what you are asking...

In the protocol, the regions of DNA where the protein of interest has bound will be 'cut' [excised] and then sequenced - both the coding and non-coding strands are sequenced, and both from their respective 5' end.

When we align these reads back to the genome, we will be capable of determining the original strand from which the reads originated [i.e. coding or non-coding]. When an in silico aligner looks at a read, it is aware that either the read or its reverse-complement may align, and through this we can infer the strand from which it originated.

For peak merging, the algorithms will look at metrics such as peak height, peak width, peak density, etc. before deciding if 2 peaks relate to the same original protein contact point.

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k

score 8 · Answer 1 · 2018-02-27

You should look at the image a bit more patiently, it contains the answer to your question (if I understood your minimalist question right). You should keep in mind that this issue stems from the time when reads were still around 36 bp, not 100bp as they are today.

We want to know where the yellow bubble has bound to the DNA. We enrich for the DNA-bubble-complex and digest away the bubble. What we're left with is pieces of double-stranded DNA where the bubble was bound. For the sake of simplicity, we can imagine that the bubble had bound exactly in the middle of the fragment, just as it is depicted above.

Now, the DNA fragment that we enriched, was longer than 36 bp, say, 500 bp. The long enriched fragments will be broken up into smaller pieces, which will still be longer than what we could sequence with 36bp reads (say, 200bp). So, all we were going to see in the raw data were 36bp of the 5' ends of those 200bp fragments. Since the original enriched piece of DNA was double-stranded, we will have fragments from both, forward and reverse strand. As the image above nicely shows, the region where the yellow bubble was will be in the middle between those ends. If you took the pile-up of those 36bp tags at face value, you would see the strongest enrichments _around_ the yellow bubble, not in its actually binding site location.

In order to pin down the location of the yellow bubble, the ends of the forward-strand-reads and the ends of the reverse-strand-reads were therefore shifted towards each other, assuming an average fragment size, e.g. 200bl. This was just meant to sharpen the signal because without the shift, the signal would be artificially broadened (i.e., it would include the fringes of the original fragment and it would be somewhat bimodal with the valley in between two peaks actually corresponding to the region that's more likely to contain the actual binding site).