I have 20 whole-genome sequences for certain complex disease and would like to look for rare or novel structural variants using different SV calling tools such as CNVnator, Pindel, Breakdancer..suggested by 1000 genome project. Also I'd like to include at least another 20 control WGS from 1000 genome project, in order to remove noisy background when calling SVs.
I'm familiar and experienced with each SV calling tool, but now confused about the pipeline of "integrating" these "multiple" tools for "multiple" samples.
Two pipelines I could think about is either
Pipeline1 (tool-centric). Make the calling using each SV tool. But I'd run multiple samples at the same time (many programs now support multiple sample calling), which is good for increasing sensitivity. Ideally I should run 20 patients + 20 control at the same time, but I don't think my disk space could hold so many big bam files simultaneously. So my plan is run three times with each time run 6 patients + 6 control. Then merge together.
Would such results be the same as runnning 20 patients + 20 control? Zev.Kronenberg from my another post said most programs apply statistics on a per library/sample basis, so should be ok?
Once I get vcf file containing information for multiple samples for each SV calling tool, how would I intersect or merge to look for overlapping callings supported by multiple tools? Using vcf-merge? vcf-isec?
Pipeline 2 (sample-centric): Make the calling using each SV tool, but this time run program independently for each sample. And for each sample, first prioritize for a list of most confident SV callings; then merge different samples together.
Anyway I'm looking for high-confident rare/novel SV, which is supposed to be very few, which need to pass very stringent filtering. So specificity is more important at the sacrifice of sensitivity.
But Question is : When merging high-confident SV calling from each sample, very likely I could see:
chr1 14657 DEL
chr1 14569 DEL
They are the same calling but with slightly different coordinate,how could I intersect them with all vcf information retained? Using vcftools-isec?
Hope make this clear