Context: I'm a new CNVKit user (using 0.9.6). I have 6 exome-seq samples (from saliva DNA) that I want to do germline copy number calling on. 2 of the samples are from saliva of healthy individuals and the other 4 are from saliva of individuals with breast cancer, all in the same extended family pedigree.
What I did so far: I ran the normal CNVKit pipeline with a flat reference and made sure to use the --access command to remove poorly mappable regions in the access-5kb-mappable.hg19.bed file mentioned in the docs. I did segmetrics with the 'ci' option, then made a scatter plot(scatter plot output seen here). I also ran the pipeline again but this time using my 2 healthy samples to generate a reference genome and compare to the individuals with breast cancer in the same family(scatter plot output seen here). The data looks very noisy in both cases. (see photos linked above)
Can anybody help me understand why my data is so noisy? And what the grey and orange bars/lines represent? I am thinking it has to do with the reference (or rather lack of a good reference...). I want to retry running this with some exome-seq samples that are unrelated to breast cancer and build a reference from those individuals, but I am not sure how much that would help, or if the issue is the reference I am using, to begin with.
What are the grey and orange bars on the scatter plot? On https://cnvkit.readthedocs.io/en/stable/plots.html I only see red bars which are supposed to be "segmentation line", but I am not entirely sure what this means and why I have 2 colors of the bars, and they are not red. I am using v 0.9.6 of CNVKit.
Any help is greatly appreciated. Thank you