Question: How is the "weight" calculated by CNVkit ?
1
gravatar for Hällyss
3.2 years ago by
Hällyss50
CHU Angers
Hällyss50 wrote:

Hello,

We search a kind of score whilch can eliminates many FP call in CNVkit results. We think that the weight can be used. We search a signification of this weight, an equation. In the manual of CNVkit, we found this extract :

A weight is assigned to each remaining bin depending on:

  1. The size of the bin;
  2. The deviation of the bin’s log2 value in the reference from 0;
  3. The “spread” of the bin in the reference.

(The latter two only apply if at least one normal/control sample was used to build the reference.)

So, we have many questions :

  • what "bin" means ? The segment ? The bin in my_target.bed ? my_antitarget.bed ?
  • what is the "spread" ? the number of bins in the segment ? the length of the segment ? something else ?
  • the score seems to be strongly affected by segment size and / or bin number in the segment, is this the case?
  • is it possible to get an equation of the weight ?

Thank you

Alice

cnv weight cnvkit • 2.0k views
ADD COMMENTlink modified 3.2 years ago by Eric T.2.6k • written 3.2 years ago by Hällyss50
1

You wrote in the tags which tool this question is about, but that would have been useful information in your post as well.

ADD REPLYlink written 3.2 years ago by WouterDeCoster43k

All of my apologies, thank you for your answer.

ADD REPLYlink written 3.2 years ago by Hällyss50
5
gravatar for Eric T.
3.2 years ago by
Eric T.2.6k
San Francisco, CA
Eric T.2.6k wrote:

So:

  • Bins are the unsegmented regions seen in my_target.bed, my_antitarget.bed, and the .cnn and .cnr files emitted by CNVkit. Sometimes also called "probes" in the code.
  • Spread is the statistical spread of coverages in a bin observed across all of the samples in your pooled reference, similar to standard deviation but calculated differently to be more robust to outliers.
  • The weight listed in the segmented .cns files is the sum of the weights of the bins/probes spanned by the segment. It correlates with segment length and number of bins, but will be a bit lower if the segment covers a region with less reliable sequencing coverage or mapping (i.e. lower-weight bins).
  • The calculation is in the function cnvlib.fix.apply_weights. It is not one equation; it depends on which data sources are available, mainly whether the reference is paired, pooled, or flat.

Also see cnvkit.py segmetrics --ci and cnvkit.py call --filter ci for filtering out potential FP segments by calculating confidence intervals for each segment's mean log2 ratio.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Eric T.2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1186 users visited in the last hour