I have a whole file with CNVnator calls like the one below.
deletion chr1:179135401-179150100 14700 0.156562 1.08417e-11 2.871e+09 1.2549e-11 2.871e+09 1 deletion chr1:179161601-179166400 4800 0.0137354 3.32026e-11 1.94083e-64 5.69188e-11 8.27154e-72 1 deletion chr1:179181001-179194400 13400 0.239849 1.18935e-11 1.70262e-08 1.398e-11 6.34306e-06 1
The definition of the colums is:
normalized_RD -- normalized to 1. p-val1 -- is calculated using t-test statistics. p-val2 -- is from probability of RD values within the region to be in the tails of gaussian distribution describing frequencies of RD values in bins. p-val3 -- same as p-val1 but for the middle of CNV p-val4 -- same as p-val2 but for the middle of CNV q0 -- fraction of reads mapped with q0 quality
The last column indicates that all the reads on which the deletion call are based have mapping quality. That is true for all 61850 calls. Even for the ones that indicate regions where no reads are mapped, ie a cnv 1 -> 0 change. Is this normal?
A large part of the genome of a lot of species exist of repetitive areas for which by definition:
1) reads map with mapping quality 0
2) reads already span the repeat
So when excluding mapping quality zero based CNV calls the read depth method can only be used for detection of copy number 1 -> copy number 2 or copy number 1 -> copy number 0 events? Or should I also look into the deletion calls based on mapping quality zero reads?