I am trying to integrate CNVkit into our in-house clinical exome pipeline. We mostly analyse (different) monogenic or polygenic diseases, so no tumor/normal pairs or anything like that.
All data is generated on the same technology platform (Novaseq 6000), with the same kit and the same bfx pipeline.
Now, my thought was that I could use a few dozen of our already sequenced samples to generate a "normal" reference. I followed the readme as much as possible, I filtered out any samples that clearly deviated in their metrics (as per "metrics") etc. This left me with 125 samples from which I created a "normal" reference using the the bait BED file for our exome kit of choice and the batch command (so all other settings left at default).
My question is this...looking at the scatter plot (a test sample against my normal reference, after segmetrics and call), it looks very noisy. Is this expected? Or are there any knobs I should turn to somehow clean this up (I am guessing going back to the reference step...?)