Question: CNVkit: Dealing with noisy tumor samples
gravatar for Fil
15 months ago by
Fil0 wrote:

Hi all,

I have a question regarding CNV analysis (new CNVkit user). I am analyzing tumor samples from whole exome sequencing using CNVkit (v 0.9.3). I followed the workflow from the CNV tutorial:
- First, I run the 'batch' command on all my normal samples to generate a pooled reference (n = 50 normal samples). - Next, using the pool reference I call copy number information from my tumor samples (n > 700 tumor samples). - Then, I run the 'metrics' command to evaluate quality of samples, inspect the coverages and remove tumor samples that had extremely high segmentation (all looked ok).

Note, I am using a pooled normal as a reference, since my samples don't have a matched normal control.

I was able to generate the scatter plot from the 'call' command output using the default parameters (attached). As you can see the scatter plot is very noisy. Also, for you reference, I tried increasing the bin size in the 'batch' command to see if this could help in reducing the noise, but this didn't make any difference in the level of the noise.  

My question is do you have any suggestion for dealing with noisy samples such as this?

Many thanks for your help in advance! Regards Fil ![Scatter plot from 'call' output (default parmaters)][1]

cnvkit • 877 views
ADD COMMENTlink modified 15 months ago by Eric T.2.6k • written 15 months ago by Fil0
gravatar for Eric T.
15 months ago by
Eric T.2.6k
San Francisco, CA
Eric T.2.6k wrote:

Hi Fil, I think we've been in contact already. My suggestions so far are:

  • Use metrics on the control samples as well to remove any noisy or unsuitable samples from the copy number reference.
  • Use a more stringent segmentation p-value, e.g. 1e-6
  • Use segmetrics and call --filter ci to remove poorly supported segment breakpoints, if you're not already doing so (see here)
  • If the segmentation is still poor, try gainloss/genemetrics without segments to see per-gene mean log2 ratios.
  • Try updating to 0.9.5 to get better performance from the segmetrics command on exomes.
ADD COMMENTlink written 15 months ago by Eric T.2.6k

Dear Eric,

Did you had a chance to check the performance of CNVkit with pooled tumor samples? We have targeted, hybrid capture, sequencing data and have no normal samples. I'm wondering which of the following setups, in principal, should be preferred:

1) Calling CNVs with no control samples (a flat reference).

2) Calling CNVs with pool of tumor samples, that were prepared by the same library preparation method and sequenced together, which will use as a control.

I understand the advantage of using "panel of tumors", specially help to deal with the large variation of depth in the different targets and to reduce batch effects. However, the main drawback is that we will miss real CNAs since they appear in the tumor samples.

many thanks!

ADD REPLYlink modified 15 months ago • written 15 months ago by biobiu110

Probably 2, unless there are highly recurrent CNAs in your cohort. I would try it both ways and compare the results.

ADD REPLYlink written 14 months ago by Eric T.2.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1416 users visited in the last hour