Question: Filtering VCF with vcf-tools using --maf=0.0001 filters out all homozygous calls ?!
gravatar for fwuffy
5 weeks ago by
Michigan, USA
fwuffy50 wrote:

I know there are some good homozygous calls in this VCF, yet vcf-tools can not see them for whatever reason. Any filtering for MAF or alt read count results in total elimination of all 1/1 homozygous calls.

an example VCF row:

chr3    69313555    rs1483895   C   T   222 .   AC1=2;AC=4;AF1=1;AN=4;ANN=T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005989.1|protein_coding|1/23|c.163-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|NM_015123.2|protein_coding|1/22|c.163-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005992.1|protein_coding|1/23|c.1-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005994.1|protein_coding|1/22|c.1-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_005264722.1|protein_coding|1/22|c.28-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005990.1|protein_coding|1/23|c.28-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005993.1|protein_coding|1/22|c.1-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005991.1|protein_coding|2/24|c.1-38G>A||||||;ASP;CAF=0.393,0.607;COMMON=1;DP4=0,0,4,30;DP=36;FQ=-77.9863;G5;G5A;GENEINFO=FRMD4B:23150;GNO;HD;INT;KGPhase1;KGPhase3;MQ0F=0;MQ=60;MQSB=1;RS=1483895;RSPOS=69313555;RV;SAO=0;SF=0,1;SGB=-0.690438;SLO;SSR=0;TOPMED=0.394567,0.605433;VC=SNV;VDB=0.0219339;VLD;VP=0x05010008000517053f000100;WGT=1;dbSNPBuildID=88  GT:DP:PL    1/1:17:255,51,0 1/1:17:255,51,0

please note the duplicate GT:DP:PL fields at the end. This may be a problem.


vcftools --vcf my.vcf  --minDP 5 --minQ 30 --maf 0.00001 --remove-indels --recode --recode-INFO-all -c | grep '1/1:'

... nothing


The duplicate last column was not the problem, I removed it and got the same result. This problem occurs with both --maf and --mac > 0. --mac 0 returns rows, --mac 1 returns none.

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by fwuffy50

I would encourage you to not use VCFtools for this. Switch to BCFtools and then try again. Due to the wide range of formatting that VCFs can have, these tools can sometimes struggle to correctly parse the information. I admit that it is difficult for the developers of these tools to account for every possibility.

I think that you may need something like bcftools view -i 'MAF[0]<0.0001'

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Kevin Blighe17k

If that doesn't work, take a look at my script here, which can technically allow you to manual calculate the MAF, indirectly: A: How to get sample names and genotype for SNP in multi-sample VCF file

ADD REPLYlink written 5 weeks ago by Kevin Blighe17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1573 users visited in the last hour