Question: CNVkit for somatic copy number detection
gravatar for stephaniem
16 days ago by
Boston, MA
stephaniem0 wrote:


I am trying to use CNVkit ( to detect somatic copy number variations for 40 paired tumor-normal WES samples. I am able to run the pipeline based on the current documentation, but I am unsure how to determine based on the output, if the detected variants are germline or somatic. I am interested in obtaining both, but I am more focused on somatic copy number variations.

Additionally, I would like to be able to tune the parameters (I am using only the defaults now), to run the pipeline most effectively. For example, the autobin step, which bam files should be used, normal or tumor or both; or the reference, to continue to pool all normal or keep tumor-normal pairs for better somatic cnv detection.

Please let me know if you know of any advice or suggestions of how to proceed with this type of analysis. Thanks so much in advance!

Best Regards, Stephanie

cnv wes cnvkit exome • 147 views
ADD COMMENTlink modified 19 hours ago • written 16 days ago by stephaniem0
gravatar for Eric T.
10 days ago by
Eric T.2.4k
San Francisco, CA
Eric T.2.4k wrote:

For autobin, use normal samples. If you have at least 5 or so normal samples prepared according to the same lab process, use those as a single pooled reference for your cohort.

You can probably assume that the CNV calls you get from the default pipeline are somatic. Population-level CNVs are typically too small to be picked up by CBS segmentation of WES samples, and if you use a pooled reference, then CNVkit will also tend to de-emphasize regions with variable copy number / coverage in your pool of control samples.

It's possible that some of your samples have rare, cancer-associated germline CNVs which would then be present in both the tumor and normal samples from a given patient. The next version of CNVkit (0.9.7, also the current code on GitHub) has an improved HMM segmentation method that can pick up these smaller CNVs, as well as a "bintest" command to test individual exons. To distinguish somatic from germline, I'd recommend building a pooled reference as usual to call CNVs in the tumor sample (using HMM or bintest), then using the same reference to call CNVs in the matched normal. Then compare the two sets of calls to see if any of the normal-sample CNVs are also present in the tumor -- if not, then it's likely a false positive.

If you think you've found a real cancer-associated germline CNV in your sample, check that it makes biological sense -- it should affect a cancer-associated gene in the right direction (e.g. hemizygous loss in a tumor suppressor), and if you have access to any clinical information about the patient, you would expect to see a family history of cancer, young onset of disease, or other cancer-related conditions.

ADD COMMENTlink written 10 days ago by Eric T.2.4k
gravatar for stephaniem
19 hours ago by
Boston, MA
stephaniem0 wrote:

Thank you so much, that is extremely helpful!

ADD COMMENTlink written 19 hours ago by stephaniem0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1759 users visited in the last hour