8.7 years ago by
I think it depends on overall coverage.
If you have many many reads, you can set windows quite small, if you have few reads, you'll have to allow large windows. In the case of chromosomes (or contigs) of only 100-1000 bp, then you need many reads.
Yoon et al (2009) say the distribution is like a Poisson with overdispersion. I find that the overdispersion is quite strong and so you can't say it is a Poisson distribution. Furthermore, mappability highly influence number of reads per window. In our paper we say that "From our experience in several different samples, selecting window size in which there are 30–180 read counts per window on average strikes a reasonable balance between error variability and bias of CNA"
Basically we have observed that with less than 30 reads per window it gets quite common that you have no reads and you can't tell if it is by chance (and low mappability) or because of actual copy loss. You "hit the bottom" and lose information about that window. On the other side, going above 180 reads per window doesn't do much, but reducing your resolution. Still if you have very high coverage, you can go beyoind that.
In fact, the CNAnorm script that converts bam file to window (bam2window.pl) let you set the size of the window OR the average number of reads in the sample with least reads. It calculates the right window size according to the sum of the chromosomes/contigs length as reported in the header of the sam/bam files.
Also, consider that with very short chromosomes/contigs, you might have some edge effect, as a considerable number of windows will be smaller than the others.