Question: Noise Dependent Copy Number Segmentation
2
gravatar for Yuri
9.4 years ago by
Yuri1.5k
Bethesda, MD
Yuri1.5k wrote:

I'm dealing with quite noisy copy number data from Affymetrix arrays (100K, 500K, etc). It's obvious that resolution of areas of aberration depends on noise and signal-to-noise ratio. The noisier the data, the larger areas can be detected reliably. With the highest noise I probably can reliably get only the whole chromosome loos or gain, but it's ok.

However, many segmentation algorithms I have tried (HMM, CBS, FASeg, etc) do not estimate noise before processing the data, and I have to optimize the parameters almost for every single sample.

Do you know/have experience with an algorithm, which would automate this task? Or what is the best practices for such analysis?

algorithm cnv • 2.4k views
ADD COMMENTlink modified 8.7 years ago by Khader Shameer18k • written 9.4 years ago by Yuri1.5k

Are you looking at chromosome aberration ?

ADD REPLYlink written 9.4 years ago by Khader Shameer18k

Yes, chromosomal deletions and amplifications.

ADD REPLYlink written 9.4 years ago by Yuri1.5k
6
gravatar for Chris Miller
9.4 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

All aCGH data is pretty noisy. If it was clean, then demarcating regions of CN gain and loss would be easy and we wouldn't need complex segmentation algorithms. One of the points of These algorithms is that they average the signal across multiple probes to try to cut through that noise.

That said, unless the array prep was botched and the data is really nasty, you should be getting way more than whole-chromosome resolution from any of those platforms.

Here are some tips for using CBS through the DNAcopy package (with which I'm most familiar):

  • if you're following the vignette, notice that it includes a smoothing step that will help take care of some of the outliers
  • Be sure to set the minimum number of probes to something sensible. I'd say use three probes minimum, and if you want to be really confident, use something more like 5 or 6.

You might also consider using the NoWaves package, which removes another type of bias in the data. (I have no experience with this, but have heard good things from a colleague).

ADD COMMENTlink written 9.4 years ago by Chris Miller21k
5
gravatar for Neilfws
9.4 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

These kinds of data are inherently noisy, unfortunately. There is surprisingly little discussion of noise in any of the relevant Bioconductor documentation, except for suggested visualisations of signal/noise ratio and some brief comments on smoothing. See, e.g. crlmm, VanillaICE (HMMs for copy number estimation) and DNAcopy, for some sample workflows.

Google search for "copy number" + "noise" throws up some references that look interesting.

A Bayesian segmentation approach to ascertain copy number variations at the population level - authors claim that "Our Bayesian approach, on the other hand, identifies the exact true segments even when noise levels are high."

Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA - motivation for the study described as "New approaches capable of jointly modeling the copy number and the non-copy number (noise) hybridization effects across multiple samples will potentially lead to more accurate results."

More discussion of noise in:

A versatile statistical analysis algorithm to detect genome copy number variation

and:

Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays - "The method described here has been developed to reduce systematic noise and precisely extract significant intensity information".

ADD COMMENTlink written 9.4 years ago by Neilfws48k
0
gravatar for Khader Shameer
9.4 years ago by
Manhattan, NY
Khader Shameer18k wrote:

I know about Illumina based algorithms for detecting chromosome abberation. For example, CNV Partition can be used for detecting copy number variation using log R ration and B-allele frequency. I am sure there will be similar tools for Affy arrays. Depending up on the type of abberations you are looking at you need to add additional algorithms in your analysis

ADD COMMENTlink modified 9.4 years ago • written 9.4 years ago by Khader Shameer18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1819 users visited in the last hour