7.6 years ago by
There are several strategies to find structural variants (SVs) with genomic or exome NGS data. First, using paired-end data, you can mine the distribution of insert sizes between read pairs and infer SVs by identifying unusual insert sizes. Second, you may scan through the genome/exome to find regions with unusually high and low coverage. This is the only approach with which you can estimate the copy number (don't how accurate that is). Then you can also use the reads that get split when mapping, which may fall into SV regions. Finally, de novo assembly followed by traditional comparative genomics approaches can also help with SV discovery. Of course, you can combine all these approaches together and find the candidates with highest confidence.
I heard CNVnator is a pretty good coverage-based tool for genomic data, but not sure whether it's gonna perform well with the exome data. Considering the size and distribution of exons, split read method seems to be attractive. My personal experience involves a genomic data set, we assembled the genomic reads de novo, and used traditional method like MUMmer to identify the SVs and verified by coverage-based approaches. It works quite well but I don't know how de novo assembly would perform for exome (I heard the Trinity pipeline is rising as a good tool for de novo assembly of transcriptome or exome).
There is nice review on Nature Reviews Genetics. It said everything I mentioned and much more. http://www.nature.com/nrg/journal/v12/n5/full/nrg2958.html