The center in scatter plot generated by CNVkit looks off
12 months ago
Jordan ★ 1.2k

Hi,

I ran CNVkit piplene on WGS samples. I have 4 tumor/normals and pooled the normals.

Here is the command I used:
 cnvkit.py batch -p $OMP_NUM_THREADS$BAMs/*_T*.bam -n $BAMs/*_B*.bam -m wgs -f$refs --annotate $refFlat --output-reference$out/project.cnn --output-dir $out  I dropped low coverage reads using the following command: cnvkit.py segment$file -o $out/drop_low_cov/${sample}.cns

But my scatter plot looks quite weird. The y chromosome has too many deletions and in general it looks the deletions are on a much larger scale.

Is there a way to address this?

Here is the plot

Thanks for the help!

These are female samples.

So why are you worried about the CN profile on the Y chromosome? There is no Y in your samples so everything you're seeing can be explained being one of the pseudoautosomal regions, or has homology with an autosome.

All the samples are female. I was a bit worried to see Y chromosome having so many deletions even if both normals and tumors are female samples. Other papers I have seen do not have such high deletions in the Y chromosomes as well.

Do these other papers completely ignore Y for female sample? I know many pipeline just throw out anything on Y once the sample has been determined to be female. More sophisticated pipelines have extra logic to handle less common scenarios such as Downs and Klinefelter syndromes as, if you have a large cohort, you'll almost certainly encounter it.

12 months ago
d-cameron ★ 2.3k

This appears to be a sex determination issue. As per the CNVkit documentation:

By default, copy number calls and log2 ratios will be relative to a diploid X chromosome and haploid Y.


This can be adjusted if you know the sex of your sample (or you want CNVkit to predict for you). See https://cnvkit.readthedocs.io/en/stable/sex.html for more details.

The y chromosome has too many deletions and in general it looks the deletions are on a much larger scale. Is there a way to address this?

In general, a deletion should have both a CN loss in the deleted region, and a breakpoint that spans it. If you want do a comprehensive genomic rearrangement assessment of your tumour samples, I would suggest the GRIDSS/PURPLE/LINX pipeline [shameless plug disclaimer - I'm first author of that preprint]. On our cohort, we have 2.1% of (non-centromric/reference gap) somatic CN transitions without an explanatory SV. We have a few samples in the 20-40% range that show signs of DNA degradation hence the higher CN false positive rate.