I noticed that the choice of target bin size has great impact on the result. I ran a sample with the default target bin size of 5000 and when I ran
cnvkit.py genemetrics sample.cnr it was reported that there was 0 gene-level gains or losses found. However, for the same sample, when I reduced the bin-size to 1000, there were gains and losses reported, some with really negative log2 values.
I was just wondering, what would be a good choice of the bin-size. How can I ensure that I don't miss out important gains and losses and at the same time minimize false positives. Will the calculated default always be a reasonable choice? Or is there any other factors that I should take into consideration?
I hope the question is clear. Thank you.