5.7 years ago by
Washington University in St. Louis, MO
Copy number analysis from NGS is based on the idea of read depth - that is, that all other things being equal, a region with a copy number of 4x will have twice as many reads as a region with a copy number of 2x. This works well with whole genome sequencing. Even though there are slight biases, due to things mapability and gc content, they can be corrected for fairly easily and accurate copy number can be assessed.
Exome sequencing, or targeted capture, is a whole different story. Factors like the GC content of the probes, the concentration of DNA, and even the temperature of hybridization, will make the number of reads captured by each probe different. These differences are often dramatic. This means that even for two regions that are both at a copy number of 2x, the number of sequencing reads can be off by orders of magnitude.
This data is essentially useless for copy number calling using standard methods. There is, however, one exception. Some smart people have figured out that if a tumor sample and matched normal are prepared at the same time, under the same conditions (same tech, same reagent batch, etc), then the biases will be roughly the same, and you can use the ratio between the two as a reasonable proxy for copy number. At the moment, I know of no other way to get accurate calls from capture-based sequencing.
Undoubtedly, people are working on the problem, and I don't claim that it's intractable - just quite difficult. In the meantime, there isn't a good way to get the information you're looking for from the data you have. Perhaps you should consider an alternate assay. Running a 500k SNP chip will give you fairly good resolution at a reasonable low price.