Question: Filtering VCF with vcf-tools using --maf=0.0001 filters out all homozygous calls ?!
gravatar for fwuffy
2.2 years ago by
Michigan, USA
fwuffy100 wrote:

I know there are some good homozygous calls in this VCF, yet vcf-tools can not see them for whatever reason. Any filtering for MAF or alt read count results in total elimination of all 1/1 homozygous calls.

an example VCF row:

chr3    69313555    rs1483895   C   T   222 .   AC1=2;AC=4;AF1=1;AN=4;ANN=T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005989.1|protein_coding|1/23|c.163-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|NM_015123.2|protein_coding|1/22|c.163-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005992.1|protein_coding|1/23|c.1-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005994.1|protein_coding|1/22|c.1-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_005264722.1|protein_coding|1/22|c.28-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005990.1|protein_coding|1/23|c.28-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005993.1|protein_coding|1/22|c.1-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005991.1|protein_coding|2/24|c.1-38G>A||||||;ASP;CAF=0.393,0.607;COMMON=1;DP4=0,0,4,30;DP=36;FQ=-77.9863;G5;G5A;GENEINFO=FRMD4B:23150;GNO;HD;INT;KGPhase1;KGPhase3;MQ0F=0;MQ=60;MQSB=1;RS=1483895;RSPOS=69313555;RV;SAO=0;SF=0,1;SGB=-0.690438;SLO;SSR=0;TOPMED=0.394567,0.605433;VC=SNV;VDB=0.0219339;VLD;VP=0x05010008000517053f000100;WGT=1;dbSNPBuildID=88  GT:DP:PL    1/1:17:255,51,0 1/1:17:255,51,0

please note the duplicate GT:DP:PL fields at the end. This may be a problem.


vcftools --vcf my.vcf  --minDP 5 --minQ 30 --maf 0.00001 --remove-indels --recode --recode-INFO-all -c | grep '1/1:'

... nothing


The duplicate last column was not the problem, I removed it and got the same result. This problem occurs with both --maf and --mac > 0. --mac 0 returns rows, --mac 1 returns none.

variants filtering vcf-tools vcf • 1.7k views
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by fwuffy100

I would encourage you to not use VCFtools for this. Switch to BCFtools and then try again. Due to the wide range of formatting that VCFs can have, these tools can sometimes struggle to correctly parse the information. I admit that it is difficult for the developers of these tools to account for every possibility.

I think that you may need something like bcftools view -i 'MAF[0]<0.0001'

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Kevin Blighe60k

If that doesn't work, take a look at my script here, which can technically allow you to manual calculate the MAF, indirectly: A: How to get sample names and genotype for SNP in multi-sample VCF file

ADD REPLYlink written 2.2 years ago by Kevin Blighe60k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1152 users visited in the last hour