Question: Noisy germline CNV data using CNVKit
1
gravatar for omg what am I doing
15 months ago by
omg what am I doing70 wrote:

Context: I'm a new CNVKit user (using 0.9.6). I have 6 exome-seq samples (from saliva DNA) that I want to do germline copy number calling on. 2 of the samples are from saliva of healthy individuals and the other 4 are from saliva of individuals with breast cancer, all in the same extended family pedigree.

What I did so far: I ran the normal CNVKit pipeline with a flat reference and made sure to use the --access command to remove poorly mappable regions in the access-5kb-mappable.hg19.bed file mentioned in the docs. I did segmetrics with the 'ci' option, then made a scatter plot(scatter plot output seen here). I also ran the pipeline again but this time using my 2 healthy samples to generate a reference genome and compare to the individuals with breast cancer in the same family(scatter plot output seen here). The data looks very noisy in both cases. (see photos linked above)

Questions:

  1. Can anybody help me understand why my data is so noisy? And what the grey and orange bars/lines represent? I am thinking it has to do with the reference (or rather lack of a good reference...). I want to retry running this with some exome-seq samples that are unrelated to breast cancer and build a reference from those individuals, but I am not sure how much that would help, or if the issue is the reference I am using, to begin with.

  2. What are the grey and orange bars on the scatter plot? On https://cnvkit.readthedocs.io/en/stable/plots.html I only see red bars which are supposed to be "segmentation line", but I am not entirely sure what this means and why I have 2 colors of the bars, and they are not red. I am using v 0.9.6 of CNVKit.

Any help is greatly appreciated. Thank you

ADD COMMENTlink modified 5 months ago by sutturka170 • written 15 months ago by omg what am I doing70
1
gravatar for brunobsouzaa
10 months ago by
brunobsouzaa210
Brazil
brunobsouzaa210 wrote:

I'm using cnvkit for a few months now and, for my little experience, I don't recommend using the flat reference. Using a pooled reference of at least 15 samples is the best! I'm now using a pooled reference with 161 samples and everything is working fine for me.

ADD COMMENTlink written 10 months ago by brunobsouzaa210

if using pool reference, does the paired control sample no longer used for analysing. can you share your command? I ran my command like this, but I do not find any criteria to find the noisy sample

firstly, I used the batch command to get all the control samples target.cnn and antitarget.cnn

command 1

cnvkit.py batch Tumor.bam --normal Normal.bam \ --targets my_baits.bed --annotate refFlat.txt \ --fasta hg19.fasta --access data/access-5kb-mappable.hg19.bed \ --output-reference my_reference.cnn --output-dir results/ \ --diagram --scatter

Secondly, I gather all the control samples target.cnn and antitarget.cnn to a empty directory,

command 2

cnvkit.py reference *coverage.cnn -f ucsc.hg19.fa -o Reference.cnn

Thirdly, I can not find that the cnvkit support control sample and pool rference just like the gatk Mutect2. so I can just give the pool reference and ignoring this normal sample

command 3

cnvkit.py batch Tumor.bam --normal Reference.cnn \ --targets my_baits.bed --annotate refFlat.txt \ --fasta hg19.fasta --access data/access-5kb-mappable.hg19.bed \ --output-reference my_reference.cnn --output-dir results/ \ --diagram --scatter

thanks a lot, and looking forward to hera more experience with cnvkit about you

ADD REPLYlink written 8 days ago by linouhao0

So, I've made a little modification on the way I work with cnvkit... For the baseline, use all samples in the same run. This will work fine! For my command, first, build your reference:

cnvkit.py batch --normal $NORMAL --targets $TARGETS --annotate $REFFLAT --fasta $GEN_REF --access $ACCESS --output-reference $CNV_REF --output-dir $OUT_DIR

Then, run analysis using the created reference:

cnvkit.py batch $TESTES -r $CNV_REF -d $OUT_DIR

Last, call cn's

cnvkit.py call ${i}.Tumor.cnr -y -m clonal -o ${i}.call.cnr
ADD REPLYlink written 8 days ago by brunobsouzaa210
1
gravatar for sutturka
5 months ago by
sutturka170
USA
sutturka170 wrote:

This link will answer you question regarding the grey and orange bars on the scatter plot. Search through Github issues and you may get more answers.

ADD COMMENTlink written 5 months ago by sutturka170

thank you so much for the help

ADD REPLYlink written 4 months ago by omg what am I doing70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1530 users visited in the last hour