Question: To remove genotype "./." from vcf file with awk
gravatar for ivivek_ngs
6.2 years ago by
Seattle,WA, USA
ivivek_ngs5.0k wrote:


I need help with awk commands, I have 4 samples in my vcf files, so field $10,$11,$12,$13 are the fields which have the genotype for each row, now I want remove the rows where in any of the any rows atleast one sample is showing the genotype "./." and want to print the rest in another vcf file, can this be done?  Am not so familiar with awk substr. Any assistance? Below is the example of my vcf file, it does not have any header. 

chr3    75787186    rs150410646    C    T    53.89    .    AC=4;AF=0.500;AN=8;BaseQRankSum=-4.341;DB;DP=424;Dels=0.00;FS=0.000;HaplotypeScore=2.2684;MLEAC=4;MLEAF=0.500;MQ=6.41;MQ0=371;MQRankSum=-3.553;QD=0.13;ReadPosRankSum=-1.007    GT:AD:DP:GQ:PL    0/1:63,21:80:48:48,0,127    0/1:25,5:29:21:21,0,64    0/1:142,41:174:10:10,0,94    0/1:95,31:120:6:6,0,120
chr3    75787576    rs141348932    A    G    61.87    .    AC=2;AF=1.00;AN=2;DB;DP=195;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=4.17;MQ0=189;QD=0.69    GT:AD:DP:GQ:PL    ./.    ./.    1/1:68,22:86:9:87,9,0    ./.
chr3    75787583    rs144348996    A    G    100.62    .    AC=2;AF=1.00;AN=2;DB;DP=203;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=4.33;MQ0=197;QD=1.12    GT:AD:DP:GQ:PL    ./.    ./.    1/1:65,25:86:12:126,12,0    ./.
chr3    75787584    rs151027881    C    A    93.62    .    AC=2;AF=1.00;AN=2;DB;DP=203;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=4.33;MQ0=197;QD=1.04    GT:AD:DP:GQ:PL    ./.    ./.    1/1:64,26:86:12:119,12,0    ./.
chr3    75787620    rs145606249    T    C    153.42    .    AC=2;AF=1.00;AN=2;DB;DP=224;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=4.38;MQ0=217;QD=1.70    GT:AD:DP:GQ:PL    ./.    ./.    1/1:52,38:86:18:179,18,0    ./.
chr3    75787728    rs111389701    C    T    643.34    .    AC=8;AF=1.00;AN=8;DB;DP=186;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=8;MLEAF=1.00;MQ=10.21;MQ0=140;QD=3.46    GT:AD:DP:GQ:PL    1/1:0,32:32:3:28,3,0    1/1:0,23:23:9:82,9,0    1/1:0,82:82:51:503,51,0    1/1:0,49:49:6:55,6,0'

I want to remove the rows where if any of the column $10,$11,$12,$13 is having  "./." no genotype then I want to eliminiate those rows. Sorry for the formatting, I am not being able to get the correct format. Any suggestions?

snp vcftools vcf • 3.5k views
ADD COMMENTlink modified 6.0 years ago by Matt Shirley9.4k • written 6.2 years ago by ivivek_ngs5.0k

Why don't you use vcftools with the --phase option?

ADD REPLYlink written 6.0 years ago by Giovanni M Dall'Olio27k

Thanks a lot,

I have figured it out with the below command

sed '/\.\/\./d' input.vcf > out.vcf

ADD REPLYlink written 6.2 years ago by ivivek_ngs5.0k

grep -vw  with appropriate escaping would do it as well

ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by vlaufer280
gravatar for Matt Shirley
6.0 years ago by
Matt Shirley9.4k
Cambridge, MA
Matt Shirley9.4k wrote:

You might take a look at [vawk](, which is a tool from Aaron Quinlan's group and acts as an intelligent wrapper for awk on VCF. 

ADD COMMENTlink written 6.0 years ago by Matt Shirley9.4k

@Matt Shirley  seems quite amazing tool. However I worked it out how to do the same with my vcf  file and I have already put it as an answer above but seems others missed it , its a one liner to remove the missed genotypes , however thanks everyone for the other smart ways and Matt thanks for the tool, pretty useful for other stuffs I am interested in.

ADD REPLYlink written 6.0 years ago by ivivek_ngs5.0k
gravatar for axelwilhelm
6.0 years ago by
axelwilhelm100 wrote:

something like

awk 'substr($10,0,3)!="./.", substr($11,0,3)!="./.", substr($12,0,3)!="./.", substr($13,0,3)!="./."'
ADD COMMENTlink written 6.0 years ago by axelwilhelm100
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 994 users visited in the last hour