Question: Interpretation of sex chromosome ploidy using CNVKit
0
3.0 years ago by
andcl930
andcl930 wrote:

I was using CNV kit for Whole Exome Sequencing data (male samples) ploidy visualization.

The first time CNVkit was run without the -y option; which gave me a scatter figure that showed all autosomes to have a copy number ratio of 0. However, the X chromosome copy number ratio was -1 and 0 for chromosome Y. At the second trial the -y option for male reference was added which gave me a different result which showed a copy number ratio of 0 for both X and Y.

These results raised me 2 questions:

1. For autosomes, the normal copy number ratio would be log2(2/2) = 0 Is it normal for the X chromosome without the -y option to end up in log2(1/2) = -1 ? ; because chromosome X showed 0 for copy number ratio when the -y option was on. Is there a compensation algorithm for males because they have only one X chromosome?

2. And if the Y chromosome copy number ratio is 0, does it mean the sample is XYY or is there another algorithm that compensates the Y chromosome copy number ratio as it appears only in single doses in normal males which should look like log2(1/2) = -1 in normal male samples? (It showed 0 for both cases, with and w/o the -y option).

sex chromosome cnvkit • 1.3k views
modified 3.0 years ago by 2nelly150 • written 3.0 years ago by andcl930
0
3.0 years ago by
Eric T.2.4k
San Francisco, CA
Eric T.2.4k wrote:

Running the "batch" command without the -y flag assumes a female reference, with expected/neutral ploidy of 2 for autosomes, 2 for X, and 1 for Y (for the sake of having a baseline level for comparing male samples). Using this reference, male samples without sex-chromosome abnormalities will have 1 X, 1 Y, so the log2 ratios you'll see are log2(1/2) = -1 for X, log2(1/1) = 0 for Y. A female sample will have `log2(2/2) = 0` for X, `log2(0/1) = -infinity` (in practice, just some noisy deep negative values) for Y.

Rerunning "batch" with the -y flag, the expected ploidies of the sex chromosomes are 1 for X, 1 for Y. With a male reference a normal male sample with have `log2(1/1) = 0` for both X and Y; a normal female sample will have `log2(2/1) = +1` for X, `log2(0/1) = -infinity` (very low numbers) for Y.

So, in both cases what you're seeing is as expected. Your male samples are normal XY.

Hi Etal, That was very interesting! Do u know if there is any option to set different ploidy in normal samples? Thank you in advance. If we are not sure about normal sample ploidy, is it correct to use rescale argument to improve visualization?

I'm not sure whether you mean the gender of the normal samples used to construct the reference, or the overall ploidy of some non-human species. The `reference` command checks the chromosomal gender of the input samples and adjusts automatically so that the reference is effectively male or female (with or without `-y`) even if the input samples are a mix of both genders.

For non-diploid species the log2 ratios are independent of ploidy, but when you use the `call` command to output absolute integer copy numbers, you can provide `--ploidy` as an argument.

All of the features of the `rescale` command are now included in `call`, so I recommend using `call` instead. Look at the `--ploidy` and `--center` options, in particular.

The problem that I have to deal with is that the normal sample I have in order to perform comparison is sequenced liver tissue, which is known that can be polyploid. I know it is not a good idea to use it as normal sample, but it s my only source of normal tissue. I also know that I can run it without normal, but at least I want to give a try. So, how can i use the ploidy or center argument here? Cause all my data are consistently below the neutral value 0 (suppose due to polyploidy of normal tissue), and rescaling corrected the visualization. Is this approach correct?

Polyploidy of the normal tissue shouldn't affect the log2 ratios if it's the same ploidy on all autosomes. Aneuploidy of the normal tissue will give confusing results, though.

If all or the majority of your segment means are below 0, then the problem is that too many regions had very low coverage, which skews the log2 ratios toward negative numbers (it will be closer to normally distributed if coverage is good and consistent). Try the `--drop-low-coverage` option to the `segment` command to remove the values that are causing at least some of the trouble. You may also want to re-run the pipeline with a larger off-target bin size (e.g. `batch --antitarget-avg-size 200000`).

So, you suggest that the extremely negative log2 ratios (-12 to -20) that I am getting are due to a uniformly bad coverage or due to non-specific target baits, right?

I tried to use the `--drop-low-coverage` argument and indeed my data look smoother. But my objection is that if you remove low coverage regions there must be a bias against losses. Unless this option removes low coverage regions both in normal and tumor samples. I was wondering if there is any argument to specify the minimum number of reads aligned in a specific bin in order to include or exclude this region from the analysis.

Yes, that's it. The log2 scaling actually to introduces a bias toward losses in low-coverage regions, which --drop-low-coverage counteracts. The tumor sample itself is a mix of tumor and some normal DNA, so any bins that were adequately captured in the normal cells within the tumor sample will have a log2 value above the --drop-low-coverage threshold. But if you're running germline samples, don't use that flag or you'll screen out true homozygous deletions.

Thank u so much Etal! Do you know what is the threshold of `--drop-low-coverage`?