Recently, a new paper (CopywriteR) was published describing copy number alteration (CNA) profiling using targeted sequencing (e.g. whole-exome sequencing). The "revolution" here is that CNA detection is based on off-target reads instead of on-target reads. This eliminates the problem that exome-baits have large variation in capture efficiency between each other and between samples/studies/batches. This way, the signal-to-noise ratio is drastically improved and approximates the signal-to-noise ratio obtained by whole-genome sequencing. Additional benefits are that no reference sample is required and CNA outside the targeted areas (so in case of WES this means the non-exonic genome) can be quantified.
The authors divided this tool in 3 stages:
- identification of off-target reads and log2(copy number) (LRR) quantification
- segmentation and plotting
This modular design allows you for example to use CopywriteR for the preprocessing part and LRR quantificaiton while you can use your own personal choice for segmentation and visualization
The only drawback I can think of is that when you have very good capture efficiencies, the amount of off-target data drops and signal-to-noise will follow. In those cases I think you could increase the bin-size (trade off with resolution) to get good quality CNA profiles.
If you want to try it, make sure you have R version >3.2 and bioconductor version >3.1
Disclaimer: I was not involved in the design/development of this tool, nor involved in publication, finances or anything else. I just think that the off-target strategy is the best solution available for whole-exome sequencing CNA profiling.