Can someone give me an explanation of how sliding windows are used for CNV analysis?
i.e., suppose I'm analyzing CNVs for chrI of S. cerevisiae. I create a pileup, take the read depth at every base position, and then divide it by the average read depth. What would I do with a sliding window (e.g., of window size 100bp) to make this more accurate?
When you are looking for copy number changes, you are looking for regions of the genome will have a different number of reads. If one of the chromosomes has a 100bp deletion, you expect that there would be half as many reads in that region compared to surrounding region (if the organism is diploid). If there is an amplification/repeat then there would be more reads in that region compared to a surrounding region.
A simple way to figure out if there is changes in coverage (number of reads overlapping region in the genome) is to split the genome into bins and count how many reads are in each bin. Changes in this number would be an indicator of having a copy number alteration in this bin. If you have a 100bp deletion but your bin size is 500,000 bp, then the reduced number of reads in that bin would be harder to detect than if your bin size was 500 bp. Thus, a smaller 'window' would be more sensitive when you are looking for copy number changes.
However, keep in mind that smaller windows will be more computationally intensive (more coverage calculations would have to be done and more numbers would have to be stored). Overlapping bins are sometimes also called sliding windows because it can be thought of as sliding a 'window' of the genome you are looking at and doing a calculation on each window.