I'm working to establish a pipeline for germline CNV calling. Our target consists of 150 genes with approx 1500 exons. Out data is HiSeq data (96 samples per run)
To establish a good reference set, I'm following this paper to calculate the inter-sample variation in coverage, using the rpkmCV for surveyed exons across reference samples selected by ExomeDepth. I know the basic formula for this is SD/Mean; but just not sure how to implement this in my data.
Does anyone know how to do this ? Or any suggestions on QC for the same ?