Question: How i remove overlapping variants in experimental group and control group?
gravatar for jaewoo.lee.1203
2.5 years ago by
jaewoo.lee.120310 wrote:

Hi I'm studying about sequencing data analysis. I have performed variant calling pipeline, and finally got two group of variants. one is experimental group, the other is control group. I have to know what kind of changes occurred to the experimental group. so I need to remove overlapping variants in two group. I performed SelectVariants in GATK and vcfremovesample in vcflib. but result showed same variants number after analysis. is there another method to remove overlapping variants in two group? I will be happy if anybody suggest me idea regarding this. Thank you.

sequencing snp • 706 views
ADD COMMENTlink modified 2.5 years ago by WouterDeCoster43k • written 2.5 years ago by jaewoo.lee.120310
gravatar for WouterDeCoster
2.5 years ago by
WouterDeCoster43k wrote:

This solution assumes you selected a sensible 'ID' for your vcf files and used the same nomenclature/system in both files. It's not clear from your explanation but it sounds like you have one vcf for controls and one vcf for the experimental group. If my assumptions are not correct you'll have to add information to your question.

First, I make a file containing the identifiers seen in the controls:

cat controls.vcf | grep -v '^#' | cut -f3 > variants_found_in_controls.txt

Next, use this file for filtering the experimental group:

cat experimental.vcf | grep -w -v -f variants_found_in_controls.txt > variants_only_in_experimental.vcf
ADD COMMENTlink written 2.5 years ago by WouterDeCoster43k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1654 users visited in the last hour