We use CNVkit for copy number analysis of tumour samples, and though the output of it is pretty reliable, we're finding artefacts in all the samples that are coming about because ( we think) our paired normals from which we make a pooled reference are from blood but the tumours are all FFPE. All of the tumour samples have some probes that are systematically different from the blood derived reference.
Ideally we'd generate a normal panel from FFPE normals, but we don't have the tissue to do this.
We've gone through various though processes of how to correct the reference file to more accurately represent the tumours.
We have a large collection of tumour samples that we'd like to make a reference from. However, all of the samples have large scale copy number changes in at least one of the chromosome, and often many, so we can't just use a few well behaved tumour bams as the reference.
One approach we considered would to do is a first pass analysis using CNVkit, identify normal ploidy chromosomes with no structural changes, and then use just those regions from each of the samples to build the reference file.
I'm not sure that CNVkit will be happy building a reference file in this way though - it would have to be built chromosome by chromosome, rather than genome wide. I haven't delved into the code to find out if this would be feasible, but I suspect it's not built to do this.
Anyone have any experience of this, or another solution to the problem?