Question: GATK: How to filter variant sites of sample VCF that are not present in a set of control VCFs
0
gravatar for ravast
5.5 years ago by
ravast10
Netherlands
ravast10 wrote:

I have to select variants sites from my sample which are not present in the control samples. i have tries this  

java -jar GenomeAnalysisTK.jar -R /Users/bhaskaran/Desktop/1data/GRCm38_68.fa -T CombineVariants -V:CON1 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8404.vcf -V:CON2 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8405.vcf -V:CON3 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8406.vcf -V:CON4 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8407.vcf -V:CON5 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8408.vcf -V:TED /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/parental_22.vcf -o result101.vcf  && java -jar GenomeAnalysisTK.jar -R /Users/bhaskaran/Desktop/1data/GRCm38_68.fa -T SelectVariants -V result101.vcf -select "set == 'TED'"  -o result102.vcf 

 

Unfortunately, this command outputs absolutely no variant. No error message occured. Is some one could help me?

sequencing snp next-gen • 3.4k views
ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by ravast10
2
gravatar for ravast
5.5 years ago by
ravast10
Netherlands
ravast10 wrote:

Vivek as per your suggestion i tired this 

java -jar GenomeAnalysisTK.jar -R /Users/bhaskaran/Desktop/1data/GRCm38_68.fa -T CombineVariants -V:CON1 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8404.vcf -V:CON2 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8405.vcf -V:CON3 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8406.vcf -V:CON4 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8407.vcf -V:CON5 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8408.vcf -V:TED /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/parental_22.vcf -o result101.vcf -env

and then

awk '! /\#/' result101.vcf | grep -w "TED" | grep -vi "CON1\|CON2\|CON3\|CON4\|CON5\|Intersection" > TED_only.vcf

again im getting no result..but when i compared this file manually o could find many variants in my sample that are not found in controls

ADD COMMENTlink written 5.5 years ago by ravast10

I'm not sure how you are doing a manual check. Do you want the variant to NOT be present in atleast one control or ANY control?

Can you post what you get from

awk '! /\#/' result101.vcf | cut -f8 | awk -F ';' '{print $NF}' | sort | uniq -c
ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by Vivek2.4k

I want variant not be present in atleast one control. I got this result..

3937677 set=Intersection

ADD REPLYlink written 5.5 years ago by ravast10
1

That pretty much tells you that you have 3,937,677 variants in the VCF file and all of them are common to all samples.

ADD REPLYlink written 5.5 years ago by Vivek2.4k

Ok.But i have a doubt...when i saw the result101.vcf ..for example...at chrm 1 position -     3010834    .    A    G    63.13    .    AC=1;AF=0.500;AN=2;BaseQRankSum=-2.428;DP=182;Dels=0.00;FS=24.892;HaplotypeScore=1.6524;InbreedingCoeff=-0.2223;MQ=44.54;MQ0=1;MQRankSum=0.170;QD=5.26;ReadPosRankSum=0.852;set=Intersection    GT:AD:DP:GQ:PL    ./.    ./.    ./.    ./.    ./.    0/1:2,5:7:19:80,0,19.

Does that mean for my sample, the variant at position 3010834  is present whereas it is absent in the controls...sorry if im wrong...

ADD REPLYlink written 5.5 years ago by ravast10
1

That looks like that variant was not called in the first 5 samples but if that were the case, GATK shouldn't tell you that its in set=intersection. You could look at the individual control sample VCFs and check if that variant entry is present in them and if so what genotype it has.

ADD REPLYlink written 5.5 years ago by Vivek2.4k

You are correct. Im trying the way as you said..

ADD REPLYlink written 5.5 years ago by ravast10

Its not geting

ADD REPLYlink written 5.5 years ago by ravast10
1
gravatar for Vivek
5.5 years ago by
Vivek2.4k
Denmark
Vivek2.4k wrote:

Does

awk '! /\#/' result101.vcf | grep -w "TED" | grep -vi "CON1\|CON2\|CON3\|CON4\|CON5\|Intersection" > TED_only.vcf

produce something?

ADD COMMENTlink written 5.5 years ago by Vivek2.4k
1
gravatar for ravast
5.5 years ago by
ravast10
Netherlands
ravast10 wrote:

i tried with individual file also ..But it is still showing no variants present.....

ADD COMMENTlink written 5.5 years ago by ravast10
0
gravatar for William
5.5 years ago by
William4.5k
Europe
William4.5k wrote:

You can use bedools substract :

http://bedtools.readthedocs.org/en/latest/content/tools/subtract.html

If you have the control samples in multiple vcf files you can first combine those to one vcf file.  And then use bedtools substract.

 

ADD COMMENTlink written 5.5 years ago by William4.5k
0
gravatar for ravast
5.5 years ago by
ravast10
Netherlands
ravast10 wrote:

Williams i tried with bedttools..but its showing no new variant..

ADD COMMENTlink written 5.5 years ago by ravast10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2075 users visited in the last hour