Where does the correlation between strands and within strands in a DNA-seq experiment come from? I have found it's present both in the input and the ChIP sequencing data for ChIP-seq experiments. How do we interpret this correlation?
One detailed explanation I have found so far is located here:
"A typical ChIP-seq experiment would show a pronounced peak at shift distance approximately equal to the prevalent size of the DNA fragments coming off the IP. This peak indicates that the DNA fragments tend to be clustered around specific positions. In other sequencing experiments, for instance those measuring DNAase I hypersensitivity, this may not be the case: the end points of the fragments may be clustered within broader regions, however complete DNA fragments would not necessarily show strong tendency to center around specific positions. In such cases, one would expect to see a high degree of read clustering, but low strand asymmetry. A cross-correlation function for such data would look almost symmetric with respect to 0 shift, with tails on both sides comparable to those of auto-correlation function."
Another source claims you can identify the average peak width and fragment length using a measure of auto-correlation and comparing it to the cross-correlation with the Crick strands. The peak width being the distance at which the auto-correlation drops to the same value as the value of the intercept on the y-axis: http://biowhat.ucsd.edu/homer/chipseq/qc.html
Can someone explain more completely why there is cross-correlation between strands and auto-correlation within strands, and what kind of information I can hope to get from using this kind of analysis in DNA-seq sequencing data.