I'm using CNVkit to process my WES data to get copy number alternation. I have 110 tumor samples, each of them has got a matched normal. CNVkit suggests to use pool normal for the normal reference. I was wondering do I need to plug in all these 110 normal .bam files for the normal reference generating? Or choosing only some of them will be sufficient? How this normal sample choosing affect the result?
The reason I'm asking this question is that, first, it would be a lot more hard drive saving when running this step, if I only use several of them to generate the normal reference. Also, I was thinking, even if I use all of my 110 normal samples to construct the pooled normal reference, when I turn to another cohort with the same disease, the normal reference will be also different.