Question: CNVkit: Dealing with noisy tumor samples
0
gravatar for Fil
5 months ago by
Fil0
Fil0 wrote:

Hi all,

I have a question regarding CNV analysis (new CNVkit user). I am analyzing tumor samples from whole exome sequencing using CNVkit (v 0.9.3). I followed the workflow from the CNV tutorial:
- First, I run the 'batch' command on all my normal samples to generate a pooled reference (n = 50 normal samples). - Next, using the pool reference I call copy number information from my tumor samples (n > 700 tumor samples). - Then, I run the 'metrics' command to evaluate quality of samples, inspect the coverages and remove tumor samples that had extremely high segmentation (all looked ok).

Note, I am using a pooled normal as a reference, since my samples don't have a matched normal control.

I was able to generate the scatter plot from the 'call' command output using the default parameters (attached). As you can see the scatter plot is very noisy. Also, for you reference, I tried increasing the bin size in the 'batch' command to see if this could help in reducing the noise, but this didn't make any difference in the level of the noise.  

My question is do you have any suggestion for dealing with noisy samples such as this?

https://ibb.co/6nwwSGW

Many thanks for your help in advance! Regards Fil ![Scatter plot from 'call' output (default parmaters)][1]

https://ibb.co/6nwwSGW

cnvkit • 386 views
ADD COMMENTlink modified 5 months ago by Eric T.2.5k • written 5 months ago by Fil0
0
gravatar for Eric T.
5 months ago by
Eric T.2.5k
San Francisco, CA
Eric T.2.5k wrote:

Hi Fil, I think we've been in contact already. My suggestions so far are:

  • Use metrics on the control samples as well to remove any noisy or unsuitable samples from the copy number reference.
  • Use a more stringent segmentation p-value, e.g. 1e-6
  • Use segmetrics and call --filter ci to remove poorly supported segment breakpoints, if you're not already doing so (see here)
  • If the segmentation is still poor, try gainloss/genemetrics without segments to see per-gene mean log2 ratios.
  • Try updating to 0.9.5 to get better performance from the segmetrics command on exomes.
ADD COMMENTlink written 5 months ago by Eric T.2.5k

Dear Eric,

Did you had a chance to check the performance of CNVkit with pooled tumor samples? We have targeted, hybrid capture, sequencing data and have no normal samples. I'm wondering which of the following setups, in principal, should be preferred:

1) Calling CNVs with no control samples (a flat reference).

2) Calling CNVs with pool of tumor samples, that were prepared by the same library preparation method and sequenced together, which will use as a control.

I understand the advantage of using "panel of tumors", specially help to deal with the large variation of depth in the different targets and to reduce batch effects. However, the main drawback is that we will miss real CNAs since they appear in the tumor samples.

many thanks!

ADD REPLYlink modified 4 months ago • written 4 months ago by biobiu90
1

Probably 2, unless there are highly recurrent CNAs in your cohort. I would try it both ways and compare the results.

ADD REPLYlink written 3 months ago by Eric T.2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1620 users visited in the last hour