How i remove overlapping variants in experimental group and control group?
1
0
Entering edit mode
6.5 years ago

Hi I'm studying about sequencing data analysis. I have performed variant calling pipeline, and finally got two group of variants. one is experimental group, the other is control group. I have to know what kind of changes occurred to the experimental group. so I need to remove overlapping variants in two group. I performed SelectVariants in GATK and vcfremovesample in vcflib. but result showed same variants number after analysis. is there another method to remove overlapping variants in two group? I will be happy if anybody suggest me idea regarding this. Thank you.

SNP sequencing • 1.3k views
ADD COMMENT
1
Entering edit mode
6.5 years ago

This solution assumes you selected a sensible 'ID' for your vcf files and used the same nomenclature/system in both files. It's not clear from your explanation but it sounds like you have one vcf for controls and one vcf for the experimental group. If my assumptions are not correct you'll have to add information to your question.

First, I make a file containing the identifiers seen in the controls:

cat controls.vcf | grep -v '^#' | cut -f3 > variants_found_in_controls.txt

Next, use this file for filtering the experimental group:

cat experimental.vcf | grep -w -v -f variants_found_in_controls.txt > variants_only_in_experimental.vcf
ADD COMMENT

Login before adding your answer.

Traffic: 1596 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6