Question: Filtering VCF with vcf-tools using --maf=0.0001 filters out all homozygous calls ?!
0
gravatar for fwuffy
7 months ago by
fwuffy50
Michigan, USA
fwuffy50 wrote:

I know there are some good homozygous calls in this VCF, yet vcf-tools can not see them for whatever reason. Any filtering for MAF or alt read count results in total elimination of all 1/1 homozygous calls.

an example VCF row:

chr3    69313555    rs1483895   C   T   222 .   AC1=2;AC=4;AF1=1;AN=4;ANN=T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005989.1|protein_coding|1/23|c.163-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|NM_015123.2|protein_coding|1/22|c.163-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005992.1|protein_coding|1/23|c.1-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005994.1|protein_coding|1/22|c.1-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_005264722.1|protein_coding|1/22|c.28-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005990.1|protein_coding|1/23|c.28-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005993.1|protein_coding|1/22|c.1-38G>A||||||,T|intron_variant|MODIFIER|FRMD4B|FRMD4B|transcript|XM_017005991.1|protein_coding|2/24|c.1-38G>A||||||;ASP;CAF=0.393,0.607;COMMON=1;DP4=0,0,4,30;DP=36;FQ=-77.9863;G5;G5A;GENEINFO=FRMD4B:23150;GNO;HD;INT;KGPhase1;KGPhase3;MQ0F=0;MQ=60;MQSB=1;RS=1483895;RSPOS=69313555;RV;SAO=0;SF=0,1;SGB=-0.690438;SLO;SSR=0;TOPMED=0.394567,0.605433;VC=SNV;VDB=0.0219339;VLD;VP=0x05010008000517053f000100;WGT=1;dbSNPBuildID=88  GT:DP:PL    1/1:17:255,51,0 1/1:17:255,51,0

please note the duplicate GT:DP:PL fields at the end. This may be a problem.

command:

vcftools --vcf my.vcf  --minDP 5 --minQ 30 --maf 0.00001 --remove-indels --recode --recode-INFO-all -c | grep '1/1:'

... nothing

EDIT:

The duplicate last column was not the problem, I removed it and got the same result. This problem occurs with both --maf and --mac > 0. --mac 0 returns rows, --mac 1 returns none.

ADD COMMENTlink modified 7 months ago • written 7 months ago by fwuffy50
1

I would encourage you to not use VCFtools for this. Switch to BCFtools and then try again. Due to the wide range of formatting that VCFs can have, these tools can sometimes struggle to correctly parse the information. I admit that it is difficult for the developers of these tools to account for every possibility.

I think that you may need something like bcftools view -i 'MAF[0]<0.0001'

ADD REPLYlink modified 7 months ago • written 7 months ago by Kevin Blighe30k
1

If that doesn't work, take a look at my script here, which can technically allow you to manual calculate the MAF, indirectly: A: How to get sample names and genotype for SNP in multi-sample VCF file

ADD REPLYlink written 7 months ago by Kevin Blighe30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1132 users visited in the last hour