I have data from thousands of samples that have undergone WES. I am interested in reliably detecting whole gene deletions of one single gene of interest, which according to very good literature should occur 3% of the time. My impression is that CNV calling from WES data has some issues due to uneven coverage, so I want to have some quality control.
If I had a couple of true positives and negatives, I could evaluate quality of a calling approach. I don't have these.
My idea today was to simulate true positives (double all reads over the gene) and true negatives (randomly delete 95% of reads over the gene. Then use these for QC.
t seems almost too simple, is there any reason in the technology or biology that would make this a terrible idea?