Question

Using CNVkit to identify poor-quality normal samples

0

Entering edit mode

5.9 years ago

Andy Lee ▴ 10

Problem

I am trying to run the analysis specified below:

http://cnvkit.readthedocs.io/en/stable/tumor.html under "Next steps"

For the careful: Run batch with just the normal samples specified as normal, yielding coverage .cnn files and a pooled reference. Inspect the coverages of all samples with the metrics command, eliminating any poor-quality samples and choosing a larger or smaller antitarget bin size if necessary. Build an updated pooled reference using batch or coverage and reference (see Copy number calling pipeline), coordinating your work in a Makefile, Rakefile, or similar build tool.

Based on the description above it seems that I should be able to inspect the statistical summary of all my normal samples and choose which ones to use to build the pooled reference.

I ran the "batch" command with just the normal samples as mentioned above and I get a targetcoverage.cnn and an antitargetcoverage.cnn file for each normal sample (I do not get a pooled reference file).

Questions

1) According to the documentation on using the "metrics" command there isn't an example that just uses the targetcoverage.cnn and antitargetcoverage.cnn files. How should I run the "metrics" command?

2) After I successfully run the analysis above, what should I be looking for in the output of the "metrics" command?

3) To build a pooled reference using the quality normal samples do I just use the "batch" command? Can I specify a different number of tumor and normal samples in the "batch" command?

cnvkit • 1.5k views

ADD COMMENT • link updated 5.9 years ago by Eric T. ★ 2.8k • written 5.9 years ago by Andy Lee ▴ 10

score 2 · Accepted Answer · 2018-05-26

You can run the metrics command with any .cnn or .cnr file, so just run it with the targetcoverage.cnn and antitargetcoverage.cnn files as you would with a .cnr file. Segments are not needed.
I suggest looking at the "stdev" or "bivar" column, depending on whether the presence of outliers or overall noise level is more important to you. If you're building the reference from a small number of normals (e.g. <10), look at "stdev", otherwise use "bivar". Higher numbers mean more noise, and outlier bins will affect "stdev" more than "bivar".
Use the reference command with the targetcoverage.cnn and antitargetcoverage.cnn files you've already generated. You don't need to organize them, just ensure the filenames match (e.g. Sample1.targetcoverage.cnn and Sample1.antitargetcoverage.cnn) and give all the .cnn files as input to the reference command. You can then use the reference you've built with batch -r to process tumor samples.