CNVKit noisy scatter plot
2
0
Entering edit mode
2.0 years ago
tanbiswas6 • 0

Hi, I'm using WES data of tumor samples to find out CNV using CNVkit. My input data was re-calibrated BAM (recal.bam) files for CNVkit. I've used CNVkit's batch command (https://cnvkit.readthedocs.io/en/stable/pipeline.html) to generate CNV profile. The output looks like this.

Then I used few stringent threshold of log2 copy ratio and used several commands (segmetrics, genemetrics) to refine the noises and the the output was like this. This data is still noisy. In the CNVkit documentation it is recommended that to decrease noises we need to lower the bin number, but I didn't find any command for that. The plot looks like this.

After getting this data, I tried to find out the plots for individual chromosome level. I used this command for that: ~/cnvkit$cnvkit.py scatter Tumor.cnr -s Tumor.cns -c chr8:80000000-120000000 -g PDP1,POP1 -o chr8.jpg --segment-color red Showing 2427 probes and 2 selected genes in region chr8:79999999-120000000 Wrote chr8.jpg The plot looks like this. ~/cnvkit$ cnvkit.py scatter Tumor.cnr -s Tumor.cns -c chr8 -g PDP1,POP1 -o chr8_2.jpg --segment-color red Showing 11090 probes and 2 selected genes in region chr8 Wrote chr8_2.jpg The plot looks like this. I don't know how to fix this. It's been 8 months I'm trying to fix this. Moving from one to another tool. I will be highly thankful if anyone can help me.

CNV CNVkit copy number profile CNV on WES data • 2.0k views
0
Entering edit mode

In my experience it looks like a QC failed sample, not as a Cnv-kit problem. It is not actually possible to refine this. The coverage profiles of 2 samples are too different. I would maybe trust high amplitude variants, but the calling in general - no. Either normal or tumor tissue library prep / sequencing failed (or just was very different - which is fine for SNV calling, but not CNA).

0
Entering edit mode

This for only one tumor sample and after getting this sequencing data I checked for its quality and it was very nice. We also used this sample to identify indels and SNVs using GATK pipeline. That time we didn't have any problem. Only to identify CNVs, this data looks noisy. I don't know where is my fault.

1
Entering edit mode

It's unlikely that this is your "fault", the data is just noisy. That happens, you won't be able to use it for CNV calling. Move on.

1
Entering edit mode

I agree with Devon's comment in general. As additional info: sometimes you may jump around the data and generate normal reference using only samples which are similar to your tumor samples - but you need to have 1) an experience, 2) a motivation to do so (it may easily take 1 day of your time). Important - it may not work out still. I did it for some project with ultra-rare cancers where every sample was valuable - but don't recommend it in general. For that project I even had to do FrankenTumors CNV calling - since normal tissue was partially tumor tissue (FFPE samples, seemingly normal was actually affected by cancer) - takes days of manual work, don't recommend, 0 stars out of 5.

So, as a conclusion - it is possible to manually "correct" the data, but if it is possible to loose 1 sample for you - just move on.

0
Entering edit mode

You can also try a different CNV caller. In my experience, the performance can vary substantially.

0
Entering edit mode

Could you suggest any?

0
Entering edit mode

There are some previous discussions on this topic, such as: Whole Exome CNV tools

0
Entering edit mode
2.0 years ago
Eric T. ★ 2.7k
1. Are you processing your tumor samples individually, with a pooled reference, or with matched normal samples? If this is an individual tumor samples with a flat reference, it's expected that WES will be noisy, and genemetrics for individual genes of interest based on the processed .cnr file may be as good as you'll get.
2. Which version of CNVkit are you using? The scatter plot of PDP1,POP1 looks odd; if that's a bug where data in the .cnr or .cns files is not being displayed properly, it may have been fixed in a more recent version.
3. You can build a new reference profile with the "reference" command. To build a flat reference with larger bin sizes, use the "target" and "antitarget" commands, then give the output of those to "reference". Your new reference profile can be used with the "batch" command.
0
Entering edit mode

Hi Eric

1. I'm processing my tumor samples with a matched normal samples. I'm using re-calibrated bam (recal.bam) files as an input. First I used the target command with -annotate and --short names to generate mytargets_short_names.bed file. Then I used the batch command with this target file.
2. I'm using CNVkit 0.9.7. The scatter plots looks so odd showing nothing with just few dots on the right most side.
3. I've not used the my_reference file for other tumors to estimate copy number as I didn't get a good profile. I've used the target and antitarget command and have the output bed files but I didn't get how to use these and also didn't find how to increase bin size on these commands by reading the documentation. Because we don't have expertise on any tools, whatever I'm getting is by reading. So, if possible could you please just mention the command names that should be used step by step to get a good profile? I don't know what should I do. Please suggest how to solve this.

Thank you for you your attention.

0
Entering edit mode
2.0 years ago
Eugene A ▴ 160

I do not have an exreience with CNVkit, but I've used GATK for the same task https://software.broadinstitute.org/gatk/documentation/article?id=11682 https://software.broadinstitute.org/gatk/documentation/article?id=11683

The key parts there to reduce noise were: 1) to assmble apropriate panel of normals (I've dowenloaded and processed normal WES samples from SRA arhive, generated with same thech. as my own tumor sample (25 samples were enough for me) 2) Call CNV only at the regions covered in your raw data (I generated a bed files with only exons having some amount of reads in my data (100?), (do not remember the exact threshold I chosed, but I tested several) and used this bed file in GATK pipeline

Hope it helps

0
Entering edit mode

Hi Eugene

1. So, you are suggesting to build a reference file using PON? I had generated pon using 7samples during gatk pipeline to call SNVs. will that be okay if I use those 7samples to generate a CNV reference file in CNVkit? or you are suggesting to download more normal WES sample and then generate the CNV reference file?
2. I've used the bed file that includes the targets of my tumor samples. But that bed file has all the gene ids, names etc. so I just used the target command with --short-name argument of CNVkit to generate a new bed file of my targets with the short names of genes.

Thank you.

0
Entering edit mode
1. I'm not sure what panel of norals you build for SNV (CreateSomaticPanelOfNormals ?) but you need to follow these instructions: https://gatk.broadinstitute.org/hc/en-us/articles/360035531092?id=11682#2 (also if you have the mathcing normal sample it should be OK to use it, if it is of a good quality and your probleams are not caused by it)

2. You need a bed file with exons coordinates covered (meaning having substantial amount of reads) in your real data (could be generated with bedtools I think) (the drawback is that you might miss a homoz. deletions, but in general it might improve your noise situation)

0
Entering edit mode

Yes, I've the bed file which have the coordinates. I used that file to make my target file for CNVkit. I'll follow the link. Okay, Thank you.