Hi I'm studying about sequencing data analysis. I have performed variant calling pipeline, and finally got two group of variants. one is experimental group, the other is control group. I have to know what kind of changes occurred to the experimental group. so I need to remove overlapping variants in two group. I performed SelectVariants in GATK and vcfremovesample in vcflib. but result showed same variants number after analysis. is there another method to remove overlapping variants in two group? I will be happy if anybody suggest me idea regarding this. Thank you.
This solution assumes you selected a sensible 'ID' for your vcf files and used the same nomenclature/system in both files. It's not clear from your explanation but it sounds like you have one vcf for controls and one vcf for the experimental group. If my assumptions are not correct you'll have to add information to your question.
First, I make a file containing the identifiers seen in the controls:
cat controls.vcf | grep -v '^#' | cut -f3 > variants_found_in_controls.txt
Next, use this file for filtering the experimental group:
cat experimental.vcf | grep -w -v -f variants_found_in_controls.txt > variants_only_in_experimental.vcf