Question

CNVkit: Dealing with noisy tumor samples

0

Entering edit mode

5.2 years ago

Fil • 0

Hi all,

I have a question regarding CNV analysis (new CNVkit user). I am analyzing tumor samples from whole exome sequencing using CNVkit (v 0.9.3). I followed the workflow from the CNV tutorial:
- First, I run the 'batch' command on all my normal samples to generate a pooled reference (n = 50 normal samples). - Next, using the pool reference I call copy number information from my tumor samples (n > 700 tumor samples). - Then, I run the 'metrics' command to evaluate quality of samples, inspect the coverages and remove tumor samples that had extremely high segmentation (all looked ok).

Note, I am using a pooled normal as a reference, since my samples don't have a matched normal control.

I was able to generate the scatter plot from the 'call' command output using the default parameters (attached). As you can see the scatter plot is very noisy. Also, for you reference, I tried increasing the bin size in the 'batch' command to see if this could help in reducing the noise, but this didn't make any difference in the level of the noise.

My question is do you have any suggestion for dealing with noisy samples such as this?

https://ibb.co/6nwwSGW

Many thanks for your help in advance! Regards Fil ![Scatter plot from 'call' output (default parmaters)][1]

https://ibb.co/6nwwSGW

CNVkit • 2.7k views

ADD COMMENT • link updated 5.2 years ago by Eric T. ★ 2.8k • written 5.2 years ago by Fil • 0

score 0 · Answer 1 · 2019-02-17

0

Entering edit mode

5.2 years ago

Eric T. ★ 2.8k

Hi Fil, I think we've been in contact already. My suggestions so far are:

Use metrics on the control samples as well to remove any noisy or unsuitable samples from the copy number reference.
Use a more stringent segmentation p-value, e.g. 1e-6
Use segmetrics and call --filter ci to remove poorly supported segment breakpoints, if you're not already doing so (see here)
If the segmentation is still poor, try gainloss/genemetrics without segments to see per-gene mean log2 ratios.
Try updating to 0.9.5 to get better performance from the segmetrics command on exomes.

ADD COMMENT • link 5.2 years ago by Eric T. ★ 2.8k

0

Entering edit mode

Dear Eric,

Did you had a chance to check the performance of CNVkit with pooled tumor samples? We have targeted, hybrid capture, sequencing data and have no normal samples. I'm wondering which of the following setups, in principal, should be preferred:

1) Calling CNVs with no control samples (a flat reference).

2) Calling CNVs with pool of tumor samples, that were prepared by the same library preparation method and sequenced together, which will use as a control.

I understand the advantage of using "panel of tumors", specially help to deal with the large variation of depth in the different targets and to reduce batch effects. However, the main drawback is that we will miss real CNAs since they appear in the tumor samples.

many thanks!

ADD REPLY • link 5.2 years ago by biobiu ▴ 150

1

Entering edit mode

Probably 2, unless there are highly recurrent CNAs in your cohort. I would try it both ways and compare the results.

ADD REPLY • link 5.1 years ago by Eric T. ★ 2.8k