Question: Weights in cnvkit cnr output
0
12 months ago by
biologist10
biologist10 wrote:

Hello everyone,

I have tried reading up on different biostars posts about cnvkit's cnr column weights but am still confused. From https://cnvkit.readthedocs.io/en/stable/pipeline.html, I understand that weights in .cnr are dependent on the size of the bin, deviation of the bin’s log2 value in the reference from 0 and spread” of the bin in the reference. I am not particularly familiar with these terms and so could someone explain what exactly this means and if I can take the cnr's weight column as a measure of how confident we can be about that particular CNV call?

Also, if I want to look at a particular range of a chromosome to look for a CNV that is made up of 5 rows from the cnr output for example, do I take the average value for the 5 log2 values but the SUM of the 5 weight values? I am asking because How is the "weight" calculated by CNVkit ? says that cns takes the sum of the weights of all the bins that make up that segment. In this case, would a higher weight be equivalent to a more confident call?

cnvkit • 420 views
modified 10 months ago by Eric T.2.6k • written 12 months ago by biologist10
1
10 months ago by
Eric T.2.6k
San Francisco, CA
Eric T.2.6k wrote:

The bin weights are correlated with the stability of the copy number signal in that bin's genomic region, as estimated from a pool of control samples (if available) and some other heuristics. In the development version of CNVkit currently on GitHub, the bin weights are more directly an estimate of `1 / variance` in log2 coverage ratios at that site.

I wouldn't use the weights directly as an estimate of confidence. Instead I'd use `segmetrics` to calculate confidence interval. In the development version there's an option to do a one-sample t-test of a segment's bins versus neutral copy number (log2 = 0), which gives you a p-value, if that's what you need.