I am a new-comer in bioinformatics and I don't really have a strong stats background. Recently while I am using CNVnator v0.3.2 for a project, I bump into questions when trying to make sense of results from the CNV calling step. I would really appreciate it if anyone can help me with them.
I saw in many literatures and older posts that people refer to the data given by the step of CNV calling as p-values and filter their raw CNV calls using p=0.05 as cut-off, however in the README file of the newest version of CNVnator, the results are referred as e-values instead of p-values. Anyone knows what has been changed in the newest version of CNVnator? Is it the case that the e-vals are converted from the p-vals (which is calculated from the t-test) and if so how? or the e-vals and the p-vals in the output can be treated interchangeably?
for e-val2, what does it mean by "the region to be in the tail of Gaussian distribution"? Can I interpret this value as the significance of the call being a CNV?
for e-val3 and e-val4, what does it mean by "for the middle of CNV" and what's the purpose of looking specifically in the middle of CNV?
here is the information given by CNVnator README file on the output for your reference:
normalized_RD -- normalized to 1.
e-val1 -- is calculated using t-test statistics.
e-val2 -- is from the probability of RD values within the region to be in the tails of a gaussian distribution describing frequencies of RD values in bins.
e-val3 -- same as e-val1 but for the middle of CNV
e-val4 -- same as e-val2 but for the middle of CNV
q0 -- fraction of reads mapped with q0 quality
Thank you in advance for comments and help!