Question: CNVnator CNV-calling result interpretation
gravatar for milk841103
3.7 years ago by
milk84110310 wrote:

Hello all,

I am a new-comer in bioinformatics and I don't really have a strong stats background. Recently while I am using CNVnator v0.3.2 for a project, I bump into questions when trying to make sense of results from the CNV calling step. I would really appreciate it if anyone can help me with them.

  1. I saw in many literatures and older posts that people refer to the data given by the step of CNV calling as p-values and filter their raw CNV calls using p=0.05 as cut-off, however in the README file of the newest version of CNVnator, the results are referred as e-values instead of p-values. Anyone knows what has been changed in the newest version of CNVnator? Is it the case that the e-vals are converted from the p-vals (which is calculated from the t-test) and if so how? or the e-vals and the p-vals in the output can be treated interchangeably?

  2. for e-val2, what does it mean by "the region to be in the tail of Gaussian distribution"? Can I interpret this value as the significance of the call being a CNV?

  3. for e-val3 and e-val4, what does it mean by "for the middle of CNV" and what's the purpose of looking specifically in the middle of CNV?

here is the information given by CNVnator README file on the output for your reference:

normalized_RD -- normalized to 1.

e-val1 -- is calculated using t-test statistics.

e-val2 -- is from the probability of RD values within the region to be in the tails of a gaussian distribution describing frequencies of RD values in bins.

e-val3 -- same as e-val1 but for the middle of CNV

e-val4 -- same as e-val2 but for the middle of CNV

q0 -- fraction of reads mapped with q0 quality

Thank you in advance for comments and help!

cnv sequence cnvnator genome • 4.5k views
ADD COMMENTlink modified 3.7 years ago by Eric T.2.5k • written 3.7 years ago by milk84110310

That explanation for e-val2 sounds like its the same thing as p-value. q0 is the fraction of reads with mapping quality zero, which indicates dubious mapping results in the region of the CNV.

ADD REPLYlink written 3.7 years ago by Vivek2.3k

yeah i found the explanations were basically the same as when they labelled the values as p-vals, just the p-vals are changed into e-vlue, but in my results I could get really large e-vals (>>1) which is impossible for p-values which makes me wonder (otherwise why would they change p-vals to e-vals, to my understanding they represent different things in stats). and there is no explanation on how the e-vals are generated. I also have difficult time understanding what each one of the four e-vals indicates..... do you by any chance have idea what are they trying to test for each of the e-vals?

ADD REPLYlink written 3.7 years ago by milk84110310

The eval2 should never be > 1 as it is mentioned specifically as a probability so if you are getting values >> 1, you might want to do a bit of troubleshooting or e-mail the author.

ADD REPLYlink written 3.7 years ago by Vivek2.3k
gravatar for Eric T.
3.7 years ago by
Eric T.2.5k
San Francisco, CA
Eric T.2.5k wrote:

It looks like this e-value means the same thing as in BLAST statistics: the number of times we expect a hit of this significance would be observed by chance in a genome or database of this size. For small values (e.g. below 0.05) the e-value and p-value converge on the same number, but for p-values that approach 1.0, the e-values instead grow above 1. An e-value >>1 means something similar to a p-value with leading 9's, i.e. almost certainly due to chance and not significant under the null model.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Eric T.2.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1548 users visited in the last hour