3.5 years ago
Tintest


I’m (trying) using the GATK4 germline CNV calling pipeline. I successfully got 57 VCFs from my sample batch, called with segments (obtained by merging the contiguous intervals), like in a classic VCF :

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  2046745451-1006_S4
M       3288    CNV_M_3288_15907        N       <DEL>,<DUP>     .       .       END=15907       GT:CN:NP:QA:QS:QSE:QSS  2:5:9:17:6:21:21
1       69071   CNV_1_69071_70028       N       <DEL>,<DUP>     .       .       END=70028       GT:CN:NP:QA:QS:QSE:QSS  1:0:1:204:204:204:204

But I got way too much of those intervals, more than 10k. I would like to know if there is an existing tool which count the different segments (variants / intervals common by +/= 75% of their length) in one VCF and gives me the count of the different segments overlapped by segments in other sample in my batch. By counting the most redundant segment, I could determine which are background noise and maybe decrease the number of variants in my VCF by filtering.

Thank you.

