Question: CNVkit - how many normal samples do I need for the pooled normal reference?
0
gravatar for ibphuangchen
2.4 years ago by
ibphuangchen10
United States
ibphuangchen10 wrote:

Hi guys,

I'm using CNVkit to process my WES data to get copy number alternation. I have 110 tumor samples, each of them has got a matched normal. CNVkit suggests to use pool normal for the normal reference. I was wondering do I need to plug in all these 110 normal .bam files for the normal reference generating? Or choosing only some of them will be sufficient? How this normal sample choosing affect the result?

The reason I'm asking this question is that, first, it would be a lot more hard drive saving when running this step, if I only use several of them to generate the normal reference. Also, I was thinking, even if I use all of my 110 normal samples to construct the pooled normal reference, when I turn to another cohort with the same disease, the normal reference will be also different.

cnvkit genome • 1.6k views
ADD COMMENTlink modified 2.4 years ago by Eric T.2.6k • written 2.4 years ago by ibphuangchen10
2
gravatar for Eric T.
2.4 years ago by
Eric T.2.6k
San Francisco, CA
Eric T.2.6k wrote:

You can use any number of normal samples with the reference command. You don't need access to all 110 normal samples at the same time; you can first run the coverage command twice on each BAM (once each for targets and antitargets), then collect the 'coverage' output .cnn files to use as input to the 'reference' command.

If you're using the batch command to get quick initial results, you can just select 10-20 of your normal BAMs to use as a pooled reference. (List your BAM files by size with ls -Sl, then choose 10 to 20 samples from the middle of the list.) This pooled reference can be used for all of your tumor samples. If you decide later that you want a larger pool, you can run 'coverage' on additional samples and use those output .cnn files along with those from your existing pool to expand the reference.

The coverage profile tends to be dependent on lab protocols and reagents, not disease -- anyway, the normal samples are from cells without disease, right? Separate references for fresh-frozen versus FFPE material would be worthwhile, and also separate them by exome capture kit if that's not the same across in your cohort.

ADD COMMENTlink written 2.4 years ago by Eric T.2.6k

Hi Eric,

Thanks for the prompt reply. My normal samples are indeed blood cells from the patients. I had been concerned that only using 10-20 instead of all the normal BAMs might not be able (i.e. not sufficient enough) to reflect all the normal samples in the cohort. I understand that this pooled normal reference should only be used for a specific sample type and a specific experimental condition. I was just a little bit worried that there might be some heterogeneity of normal samples. If you don't consider all for the pooled reference, you might loose some information from these missed normal samples. Thanks again!

ADD REPLYlink written 2.4 years ago by ibphuangchen10
1

Yes, that's all true, but 10-20 samples is still usually good enough to capture most of the consistent characteristics and biases of your lab process, and it lets you quickly "peek" at the results with the batch command without wasting much computation. For a clinical pipeline, I would recommend you run coverage on each of the remaining normal samples to build a comprehensive pooled reference.

ADD REPLYlink written 2.4 years ago by Eric T.2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 611 users visited in the last hour