I want to get genome wide copy number calls from a cohort (N=45) of human tumor-normal samples genotyped by Affymetrix GenomeWide SNP6.0 arrays. I have normalized the raw data and calculated Log2-R-Ratios for all probes (N=933,122) for all tumor-normal pairs, so far so good. However, when I perform CBS-segmentation with pruning implemented by DNAcopy it hangs (after 48h I killed it) on the 12the sample. However, when I randomly dilute the dataset to 100,000 probes, it takes about 10 seconds per sample to do the job. Is it something intrinsic from DNAcopy that it exponentially gets slower when the numbers of probes increase? What can I do to get good (i.e. not too much segments due to local trends) CBS-segmented copy number calls?
As a bonus question, in DNAcopy the min.width parameter only allows values between 2 and 5. What can I do when I want the segmentation algorithm to use at least 20 probes for segments?
samples <- colnames(lrr)[grep("MySampleIdentifier",colnames(lrr))]
number.probes <- 100e3
lrr <- lrr[sample(1:nrow(lrr),number.probes,replace=F),]
cna <- CNA(
segments <- segment(