how to interpret a CNV scatter plot
2.5 years ago

Dear all,

I needed help with the interpretation of these copy number to scatter plots from 2 patients. I'm interested in copy number changes in the CCR4 gene on chr3. The data is derived from a custom target panel. I realize from the plots that both patients have copy number losses in CCR4. However, I need help understanding 1) why patient C has 2 orange segments in the CCR4 region (would each indicate a different allele?) 2) whether we can comment on the bi-allelic or mono-allelic nature of the losses in each patient

2.5 years ago
Amitm ★ 2.1k

Hi, I am assuming that this is from sequencing-based approach. PatientB has a very strong signal for a deletion segment spanning ~30thMb to 60th Mb region, which includes your gene. PatientC appears to have a deletion signal but which appears to be focal around CCR4 only. The orange line is the 'segmentation' outcome which takes into account how many probes/markers in surrounding area were different to, say the base trend i.e. copy ratio '0' (no change in copy number). Because the deletion signal in patientB is much large (~30thMb to 60th Mb), the 'segmentation' process binned that part of the region into a separate segment (i.e. a region which is in a different copy number state), and hence the separate orange line at copy ratio '-1'. In patientC, the deletion signal is not widespread and hence the 'segmentation' process hasn't created a separate 'segment' (orange line).

In terms of the fig. and whats available from the plot, it could be said that pat.B has mono-allelic loss (as the separate orange line is at '-1'). You could say the same for pat.C as well, albeit with somewhat lower confidence. Hope this helps.

Hi Amit, thanks so much, this was very helpful. Sorry I didn't ask my question more clearly, but in my first question I was wondering why there are 2 separate orange segments in the CCR4 region in patient C (one around -0.6, the other one around -.1.8)? also thanks for your comment regarding mono vs bi-allelic, what would make you say a deletion is bi-allelic, if the segmentation line goes far down beyond -5? Thanks a lot again, much appreciated!

No worries. To understand better the outcome in pat.C, I would recommend to look into the 'segmentation' output from which these plots are made. It should be in a format similar to

chr<TAB>Start<TAB>End<TAB>seg.mean<TAB>marker support


That should tell what those two small segments in pat.C are like. The genomic window of the plot is quite large and its hard say by looking at the plot only. Also, note that the 'segmentation' process is an algorithm and not everything that comes out of it can be necessarily explained biologically. To ascertain if there could be bi-allelic loss, given that this is sequencing data (?), you could open the BAM file in IGV browser. For a bi-allelic loss (meaning both alleles gone), there would be hardly any reads mapping in the loci. But it doesn't seem like so from the above plot. To add to all this, there is also aspect of the proportion of 'diseased' cell vs. 'non-diseased' cell in the sample. In an ideal situation with 100% of cells being 'diseased', bi-allelic should mean no reads/ signal at all.

Got it, thank you very much! I will check the segmentation output and yes you are correct, this is sequencing data.

I have the segmentation output format to the one you listed chr<TAB> and so on. But, unclear how to come to the point to plot similar to what is shown in the question. came across multiple package CNVplot, chromosomecopy number etc. But, the input format seems to be different. Can you help me to a resource on the steps from the below format

chr<TAB>Start<TAB>End<TAB>seg.mean<TAB>marker support

To CNV plot ?

The plot that the OP has shown, are coming from a copy-number package I think. You could attempt to create your own plot from scratch using the segmentation output, but that would be a lot of work as the plots are not only about the segments (Strt,End) but also their location within the genome (and gene locations as well). The grey dots in the plot are individual bins (if from sequencing data) or individual probes (if from array data). This detail is not available in the segmentation output you have indicated. cnvkit is a good package for NGS based data, and ASCAT or DNACopy for array-based data would be a good place to start. All of these packages have lots of plotting options.