Question: Why do peak shifts occur?
gravatar for anikethsuresh
12 months ago by
anikethsuresh10 wrote:

Why do positive and negative tags after tag generation, shift peaks to the middle of the two tag location?As intag

peak calling • 971 views
ADD COMMENTlink modified 12 months ago by Friederike3.2k • written 12 months ago by anikethsuresh10

Basically, why do the tags extend in the direction of the other polarity tags? And why do they even have to be shifted there?

ADD REPLYlink written 12 months ago by anikethsuresh10

It is not clear what you are asking...

In the protocol, the regions of DNA where the protein of interest has bound will be 'cut' [excised] and then sequenced - both the coding and non-coding strands are sequenced, and both from their respective 5' end.

When we align these reads back to the genome, we will be capable of determining the original strand from which the reads originated [i.e. coding or non-coding]. When an in silico aligner looks at a read, it is aware that either the read or its reverse-complement may align, and through this we can infer the strand from which it originated.

For peak merging, the algorithms will look at metrics such as peak height, peak width, peak density, etc. before deciding if 2 peaks relate to the same original protein contact point.

ADD REPLYlink modified 12 months ago • written 12 months ago by Kevin Blighe39k
gravatar for Friederike
12 months ago by
United States
Friederike3.2k wrote:

You should look at the image a bit more patiently, it contains the answer to your question (if I understood your minimalist question right). You should keep in mind that this issue stems from the time when reads were still around 36 bp, not 100bp as they are today.

We want to know where the yellow bubble has bound to the DNA. We enrich for the DNA-bubble-complex and digest away the bubble. What we're left with is pieces of double-stranded DNA where the bubble was bound. For the sake of simplicity, we can imagine that the bubble had bound exactly in the middle of the fragment, just as it is depicted above.

Now, the DNA fragment that we enriched, was longer than 36 bp, say, 500 bp. The long enriched fragments will be broken up into smaller pieces, which will still be longer than what we could sequence with 36bp reads (say, 200bp). So, all we were going to see in the raw data were 36bp of the 5' ends of those 200bp fragments. Since the original enriched piece of DNA was double-stranded, we will have fragments from both, forward and reverse strand. As the image above nicely shows, the region where the yellow bubble was will be in the middle between those ends. If you took the pile-up of those 36bp tags at face value, you would see the strongest enrichments _around_ the yellow bubble, not in its actually binding site location.

In order to pin down the location of the yellow bubble, the ends of the forward-strand-reads and the ends of the reverse-strand-reads were therefore shifted towards each other, assuming an average fragment size, e.g. 200bl. This was just meant to sharpen the signal because without the shift, the signal would be artificially broadened (i.e., it would include the fringes of the original fragment and it would be somewhat bimodal with the valley in between two peaks actually corresponding to the region that's more likely to contain the actual binding site).

ADD COMMENTlink written 12 months ago by Friederike3.2k

Thanks for the Explanation!

ADD REPLYlink written 12 months ago by anikethsuresh10

you're welcome! glad to see it may have helped.

ADD REPLYlink written 12 months ago by Friederike3.2k

Is this shifting behaviour of the reads valid for RNA-seq as it is for ChIP-seq?

ADD REPLYlink written 7 months ago by salamandra180

only if you seek to find binding sites of proteins on transcripts using antibody-based enrichment of your factor of interest as it is binding to RNA. generally, it would be imprecise say that the reads are shifting -- the reads are always going to represent the ends of the (c)DNA fragments that were put onto the flow cell. the only reason reads used to be shifted computationally for ChIP-seq analysis was because we were interested in the information that was _not_ being captured, i.e. those parts of the fragments where the protein had bound that often tended to be in the center of those fragments.

for typical RNA-seq, the main information of interest is usually just the abundance of transcripts, which can be determined based on the sequences that we do capture

ADD REPLYlink written 7 months ago by Friederike3.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1709 users visited in the last hour