Question: How i remove overlapping variants in experimental group and control group?
gravatar for jaewoo.lee.1203
17 months ago by
jaewoo.lee.120310 wrote:

Hi I'm studying about sequencing data analysis. I have performed variant calling pipeline, and finally got two group of variants. one is experimental group, the other is control group. I have to know what kind of changes occurred to the experimental group. so I need to remove overlapping variants in two group. I performed SelectVariants in GATK and vcfremovesample in vcflib. but result showed same variants number after analysis. is there another method to remove overlapping variants in two group? I will be happy if anybody suggest me idea regarding this. Thank you.

sequencing snp • 482 views
ADD COMMENTlink modified 17 months ago by WouterDeCoster37k • written 17 months ago by jaewoo.lee.120310
gravatar for WouterDeCoster
17 months ago by
WouterDeCoster37k wrote:

This solution assumes you selected a sensible 'ID' for your vcf files and used the same nomenclature/system in both files. It's not clear from your explanation but it sounds like you have one vcf for controls and one vcf for the experimental group. If my assumptions are not correct you'll have to add information to your question.

First, I make a file containing the identifiers seen in the controls:

cat controls.vcf | grep -v '^#' | cut -f3 > variants_found_in_controls.txt

Next, use this file for filtering the experimental group:

cat experimental.vcf | grep -w -v -f variants_found_in_controls.txt > variants_only_in_experimental.vcf
ADD COMMENTlink written 17 months ago by WouterDeCoster37k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2219 users visited in the last hour