Question: CBS segmentation with DNAcopy taking ages/hanging
3
gravatar for Irsan
5.0 years ago by
Irsan6.8k
Amsterdam
Irsan6.8k wrote:

I want to get genome wide copy number calls from a cohort (N=45) of human tumor-normal samples genotyped by Affymetrix GenomeWide SNP6.0 arrays. I have normalized the raw data and calculated Log2-R-Ratios for all probes (N=933,122) for all tumor-normal pairs, so far so good. However, when I perform CBS-segmentation with pruning implemented by DNAcopy it hangs (after 48h I killed it) on the 12the sample. However, when I randomly dilute the dataset to 100,000 probes, it takes about 10 seconds per sample to do the job. Is it something intrinsic from DNAcopy that it exponentially gets slower when the numbers of probes increase? What can I do to get good (i.e. not too much segments due to local trends) CBS-segmented copy number calls?

As a bonus question, in DNAcopy the min.width parameter only allows values between 2 and 5. What can I do when I want the segmentation algorithm to use at least 20 probes for segments?

library(DNAcopy)
samples <- colnames(lrr)[grep("MySampleIdentifier",colnames(lrr))]
number.probes <- 100e3
lrr <- lrr[sample(1:nrow(lrr),number.probes,replace=F),]
cna <- CNA(
  genomdat=lrr[,samples],
  chrom=lrr$chromosome,
  maploc=lrr$physicalPosition,
  data.type="logratio",
  sampleid=samples
)
segments <- segment(
  cna,
  min.width=5,
  undo.splits="prune",
  undo.prune=0.05
)

 

snp software error • 2.8k views
ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Irsan6.8k
1
gravatar for arno.guille
5.0 years ago by
arno.guille400
France
arno.guille400 wrote:

segments <- segment(cna)

With the "prune" option, the CBS is very slow for high resolution chips. I suggest you to keep the default parameters.

Then the min.width parameter seems to have no effect. Try this script in addition to the CBS

http://www.cbs.dtu.dk/~hanni/aCGH/MergeLevels.html

 

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by arno.guille400

Hi Arno, thanks for your suggestions, I'll try the MergeLevels script after default segmentation.
 

ADD REPLYlink written 5.0 years ago by Irsan6.8k

Hmm, the MergeLevels script seems very inconvenient as you have to transform the segmented data back to a segmented value for each probe. You can transform the CBS output this way:

predicted.lrr <-
  as.matrix(
    sapply(unique(segments$output$ID),
    function(x){
      index <- which(segments$output$ID==x)
      as.numeric(unlist(apply(segments$output[index,],1,function(t){rep(t[6],t[5])})))
    })
  )

But still, when MergeLevels is done you end up with a vector of corrected segment levels but not in the DNAcopy output format with 1 row per segment which for me is very convenient for downstream analyses. I will use CGHcall/CGHregions in stead.

ADD REPLYlink written 5.0 years ago by Irsan6.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 779 users visited in the last hour