Question: CNVkit - Proper Way to Generate Accurate Sex Chromosome Predictions
gravatar for fongchunchan
4.8 years ago by
fongchunchan10 wrote:

I was wondering what is the best protocol for getting accurate sex chromosomes copy number predictions?

Currently I have a pool for normals of mixed gender which I am passing into the reference function. I don't set the -y option to create a male reference. There appears to be no option to actually give the gender of the various input coverage files. Yet if you use the -y option, the manual says:

Create a male reference: shift female samples' chrX log-coverage by -1, so the reference chrX average is -1. Otherwise, shift male samples' chrX by +1, so the reference chrX average is 0.

I assume that it automatically detects the gender and accounts for their sex chromosomes? Or is there a way to pass in the exact gender of each input normal sample?

I then use this pooled reference to then call fix, and then when it comes to the call function there is the -g option to specify the gender of the input sample, unlike the reference function.

Is there any critical steps that I am missing to getting accurate sex chromosome copy number predictions?

cnvkit • 1.9k views
ADD COMMENTlink modified 4.8 years ago by Eric T.2.6k • written 4.8 years ago by fongchunchan10
gravatar for Eric T.
4.8 years ago by
Eric T.2.6k
San Francisco, CA
Eric T.2.6k wrote:

The reference command will detect the input samples' chromosomal genders and adjust automatically whether or not -y was given -- without -y they are all converted to XX. The detection is pretty reliable; the reference samples are supposed to be generally copy-number-neutral, so if a sample has Turner syndrome or large-scale CNVs on chromosome X, it probably shouldn't be in the reference pool. The reference command will print the gender detected for each sample when it runs, so I recommend just checking the log messages to ensure that all samples' genders were detected correctly, then proceeding with the rest of the pipeline.

(But if the gender calls are incorrect for multiple samples for no clear reason, please let me know.)

ADD COMMENTlink written 4.8 years ago by Eric T.2.6k

@Etal: there is indeed a problem with the automatic gender detection when using reference command.

My sample is from a male. Command I use is: reference \
   *coverage.cnn \
   --fasta ref.fasta \
   -y \
   -o normal_ref.cnn

For the targetcoverage.cnn, gender is wrong:

Relative log2 coverage of X=-1.32, Y=-13.4 (maleness=0.501 x 1.45 = 0.728) --> assuming female

For the antitargetcoverage.cnn, gender is correct:

Relative log2 coverage of X=-1, Y=-1.26 (maleness=0.632 x 2.85 = 1.8) --> assuming male

Since I have only 1 normal sample, I cannot discard it from the reference pool. Is there a way to edit the script to bypass automatic gender detection?

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by user31888100

Thanks, I'll see about adding a --gender option to the reference command in the next release.

Workaround: It looks like the coverage of the Y-chromosome targets in your sample was poor. Look at the 'log2' column in normal_ref.cnn to identify the targets on Y that were poorly captured in your normal sample, then delete those targets from your target BED file or the source targetcoverage.cnn files (make sure they all match) and rebuild the reference. If only the well-captured targets on Y remain, gender detection should work better. (If no Y targets remain, the pipeline will still work.)

I changed the statistical test in the development version of CNVkit on GitHub, so if you're able to try that it might deliver a better result. But given that the majority of targets on Y had poor coverage, it might still be misled into thinking there is no Y chromosome in your sample.

To hard-code your sample's gender in the script, you can edit cnvlib/ line 99 or so, where it says:

is_sample_female = cnarr.guess_xx()

Replace the method call with False to treat the sample as male.

ADD REPLYlink written 4.3 years ago by Eric T.2.6k

Ok. I try that. Thanks !

ADD REPLYlink written 4.3 years ago by user31888100
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1754 users visited in the last hour