Question: CNVkit for germline samples
0
gravatar for merella.stefania
10 days ago by
merella.stefania0 wrote:

Dear all, I am trying to use CNVkit to detect copy number variant in my samples. My sequencing data is derived from peripheral blood so I am looking for germline copy number variants. I don't have tumor sample, only germline.

I followed the suggestions of the documentation and here is the flow of the pipeline that I used to analyse my data:

python3 /usr/local/cluster/bin/cnvkit.py batch germline_sample.bam --n --targets brca_slop.bed --fasta hg19.fa --output-reference germline_sample.cnn --output-dir cnvkit_brca/
python3 /usr/local/cluster/bin/cnvkit.py segmetrics -s germline_sample.cn{s,r} --ci
python3 /usr/local/cluster/bin/cnvkit.py call germline_sample.segmetrics.cns --filter ci -o germline_sample.calls.cns

As test I used a sample that I know has a deletion of the exon 23 in BRCA1 gene. But I did not found any result:

python3 /usr/local/cluster/bin/cnvkit.py breaks germline_sample.cnr germline_sample.calls.cns 
Found 0 gene breakpoints
gene    chromosome      location        change  probes_left     probes_right
python3 /usr/local/cluster/bin/cnvkit.py gainloss germline_sample.cnr -s germline_sample.calls.cns 
*WARNING* No chrX found in probes; check the input
Treating sample gremline_sample as male
*WARNING* No chrX found in probes; check the input
*WARNING* No chrX found in probes; check the input
*WARNING* No chrX found in probes; check the input
Found 2 gene-level gains and losses
gene    chromosome      start   end     log2    depth   weight  cn      probes
BRCA2   chr13   32890547        32972958        -0.243229       418.202 36.7767 2       269
BRCA1   chr17   41197644        41276164        -0.795891       284.416 22.3257 1       309

Definitely I am missing something. Is my flow correct? Can someone help me in understand what is wrong and how I have to use CNVkit on my data? Thanks for the help.

Stefania

ADD COMMENTlink modified 9 days ago by Eric T.1.8k • written 10 days ago by merella.stefania0
0
gravatar for Eric T.
9 days ago by
Eric T.1.8k
San Francisco, CA
Eric T.1.8k wrote:

Your workflow is correct, but if this is exome sequencing data, then CBS segmentation won't be able to pick up a single-exon deletion. However, you can still see each bin's log2 value in the scatter plots, if you're reviewing genes of interest that way.

I've added a script cn_ztest.py to perform a simple per-bin test on the bins in a .cnr file: https://github.com/etal/cnvkit/blob/master/scripts/cn_ztest.py

(Pull the latest from the GitHub repo to ensure it all works as expected.)

This test will perform best if you constructed a pooled reference from multiple normal samples (the more the better). The output is a listing of the bins that have a log2 value farther from the neutral value than would be expected by chance, based on the reference spread (back-calculated from the .cnr "weight" column). Use the -t option to ignore off-target bins, and -a to specify a significance threshold other than 0.005.

python cn_ztest.py germline_sample.cnr -t -o germline_sample.ztest.cnr

It will also tend to report many of the same genes that were detected in your .cns file, so try comparing the two.

ADD COMMENTlink modified 9 days ago • written 9 days ago by Eric T.1.8k

Hi Eric, Many thanks for your answer. I am not using whole exome data but I have shorter sequencing panel. I tried to analyse data as you suggested creating a pooled reference using the other samples as normal samples (in total 12 samples)

python3 /usr/local/cluster/bin/cnvkit.py batch germline_sample.bam --normal BAM_NORMAL/*.bam --targets brca_slop.bed --fasta hg19.fa --output-reference brca.reference.cnn --output-dir cnvkit_brca/

Then I used your new script to investigate log2 value, but I obtained 0 significant results. I also tried increasing the threshold with -a argument:

python3 ../cnvkit-master/scripts/cn_ztest.py germline_sample.cnr -t -a 0.05 -o germline_sample.ztest.cnr
*WARNING* No chrX found in probes; check the input
Treating sample germline_sample as male
Ignoring 241 off-target bins
Significant hits in 0/86 bins (0%)

Do you think that it is a problem related with the limited dimension of the sequencing panel? Or, as you already suggested, the deletion is too small to be found by CNVkit. If CNVkit is not the correct tool to use in my case, do you have any suggestion about other tools?

Thank you so much for the help.

Stefania

ADD REPLYlink written 6 days ago by merella.stefania0

The small panel size shouldn't be a problem. But this test isn't especially sensitive. A couple questions:

  • Was this sequencing done by hybrid capture or targeted amplicon sequencing? The small panel size and small number of off-target bins suggests the latter. If so, use the batch command option --method amplicon to reduce the chance of errata in your output files.

  • What do you see if you use an unreasonably high alpha, e.g. -a 0.9? The output .cnr file will have an extra field called ztest, listing the p-values for each bin. If these are all 1.0, then something went wrong -- check brca.reference.cnn to see if the spread column there is all zeros, because that would mean the pooled reference was not built properly and the observed variance at each bin was not recorded for the Z-test. If you do see decimal p-values in your output .cnr file, you can jump to BRCA1 exon 23 and see the p-value there.

Note that this script adjusts p-values for multiple hypothesis tests, so it's possible that the log2 read depth ratio at BRCA1 exon 23 was not significantly below the range of normal samples at that site. You can do some of the math by hand: in brca.reference.cnn, find the bin of interest, then look at the values in the log2 and spread columns. If the log2 value in your test sample was less than the corresponding reference log2 value minus twice the spread value, then the Z-test has a chance of picking it up, otherwise the script will probably miss it.

ADD REPLYlink written 4 days ago by Eric T.1.8k

One more thought: Since BRCA1 is already detected as hemizygous in this sample, you can use the -s option to cn_ztest.py to make the p-values relative to the surrounding segment, rather than to the genome-wide neutral copy number value (i.e. diploid). That will actually make the test less sensitive to a deep deletion within BRCA1, but in any case, it's relevant.

You should still be able to see the BRCA1 exon 23 in a scatter plot:

cnvkit.py scatter germline_sample.cnr -s germline_sample.cns -g BRCA1

If it's not showing up there, that bin may have been filtered out earlier in the pipeline for reasons that may be clear if you post the corresponding row in brca.reference.cnn here.

ADD REPLYlink written 4 days ago by Eric T.1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 744 users visited in the last hour