You should look at the image a bit more patiently, it contains the answer to your question (if I understood your minimalist question right). You should keep in mind that this issue stems from the time when reads were still around 36 bp, not 100bp as they are today.
We want to know where the yellow bubble has bound to the DNA. We enrich for the DNA-bubble-complex and digest away the bubble. What we're left with is pieces of double-stranded DNA where the bubble was bound. For the sake of simplicity, we can imagine that the bubble had bound exactly in the middle of the fragment, just as it is depicted above.
Now, the DNA fragment that we enriched, was longer than 36 bp, say, 500 bp. The long enriched fragments will be broken up into smaller pieces, which will still be longer than what we could sequence with 36bp reads (say, 200bp). So, all we were going to see in the raw data were 36bp of the 5' ends of those 200bp fragments. Since the original enriched piece of DNA was double-stranded, we will have fragments from both, forward and reverse strand. As the image above nicely shows, the region where the yellow bubble was will be in the middle between those ends. If you took the pile-up of those 36bp tags at face value, you would see the strongest enrichments _around_ the yellow bubble, not in its actually binding site location.
In order to pin down the location of the yellow bubble, the ends of the forward-strand-reads and the ends of the reverse-strand-reads were therefore shifted towards each other, assuming an average fragment size, e.g. 200bl. This was just meant to sharpen the signal because without the shift, the signal would be artificially broadened (i.e., it would include the fringes of the original fragment and it would be somewhat bimodal with the valley in between two peaks actually corresponding to the region that's more likely to contain the actual binding site).