Filtering VCF with vcf-tools using --maf=0.0001 filters out all homozygous calls ?!
0
0
Entering edit mode
5.0 years ago
fwuffy ▴ 100

I know there are some good homozygous calls in this VCF, yet vcf-tools can not see them for whatever reason. Any filtering for MAF or alt read count results in total elimination of all 1/1 homozygous calls.

an example VCF row:

chr3    69313555    rs1483895   C   T   222 .   AC1=2;AC=4;AF1=1;AN=4;ANN=T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005989.1|protein_coding|1/23|c.163-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|NM_015123.2|protein_coding|1/22|c.163-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005992.1|protein_coding|1/23|c.1-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005994.1|protein_coding|1/22|c.1-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_005264722.1|protein_coding|1/22|c.28-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005990.1|protein_coding|1/23|c.28-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005993.1|protein_coding|1/22|c.1-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005991.1|protein_coding|2/24|c.1-38G>A||||||;ASP;CAF=0.393,0.607;COMMON=1;DP4=0,0,4,30;DP=36;FQ=-77.9863;G5;G5A;GENEINFO=FRMD4B:23150;GNO;HD;INT;KGPhase1;KGPhase3;MQ0F=0;MQ=60;MQSB=1;RS=1483895;RSPOS=69313555;RV;SAO=0;SF=0,1;SGB=-0.690438;SLO;SSR=0;TOPMED=0.394567,0.605433;VC=SNV;VDB=0.0219339;VLD;VP=0x05010008000517053f000100;WGT=1;dbSNPBuildID=88  GT:DP:PL    1/1:17:255,51,0 1/1:17:255,51,0


please note the duplicate GT:DP:PL fields at the end. This may be a problem.

command:

vcftools --vcf my.vcf  --minDP 5 --minQ 30 --maf 0.00001 --remove-indels --recode --recode-INFO-all -c | grep '1/1:'


... nothing

EDIT:

The duplicate last column was not the problem, I removed it and got the same result. This problem occurs with both --maf and --mac > 0. --mac 0 returns rows, --mac 1 returns none.

vcf vcftools variants • 4.5k views
1
Entering edit mode

I would encourage you to not use VCFtools for this. Switch to BCFtools and then try again. Due to the wide range of formatting that VCFs can have, these tools can sometimes struggle to correctly parse the information. I admit that it is difficult for the developers of these tools to account for every possibility.

I think that you may need something like bcftools view -i 'MAF[0]<0.0001'

1
Entering edit mode

If that doesn't work, take a look at my script here, which can technically allow you to manual calculate the MAF, indirectly: A: How to get sample names and genotype for SNP in multi-sample VCF file