vcftools warnings worrisome?
0
0
Entering edit mode
3 months ago
8armed ▴ 10

I obtained raw variant calls by running gatk's GenotypeGVCFs on a combined vcf-file at default:

gatk GenotypeGVCFs -R REFERENCE.ASSEMBLY.fa --variant combined.vcf -output combined_RAW.variants.vcf.gz

I then ran vcftools to filter these raw variants:

vcftools --gzvcf combined_RAW.variants.vcf.gz --out combined_filtered.variants.vcf --recode --recode-INFO-all --remove-indels --minDP 5 --mac 5 --minGQ 20 --minQ 30 --max-missing 0.8

I then get the following warnings. I tried an older gatk version and the warnings remained. Are these warnings worrying, and if so, how to fix them?

Using zlib version: 1.2.13
Warning: Expected at least 2 parts in FORMAT entry: ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">
Warning: Expected at least 2 parts in FORMAT entry: ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
Warning: Expected at least 2 parts in FORMAT entry: ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
Warning: Expected at least 2 parts in FORMAT entry: ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
After filtering, kept 320 out of 320 Individuals
Outputting VCF file...
After filtering, kept 13713 out of a possible 441686 Sites
Run Time = 113.00 seconds

VCFtools - 0.1.16
(C) Adam Auton and Anthony Marcketta 2009
vcftools • 522 views
ADD COMMENT
0
Entering edit mode

vcftools is no longer actively maintained. Switch to bcftools please.

ADD REPLY
0
Entering edit mode

But it seems that not all filtering options are supported by bcftools. At least, I cannot find all of them. The full set of genotype and site filters that I'm using from vcftools in this and subsequent analyses are: --remove-indels --not-chr --minDP --max-meanDP --mac --maf --minGQ --minQ --max-missing Are these all available in bcftools?

Overall, it seems that the filtering is working but I just get these errors...

ADD REPLY
0
Entering edit mode

First off, they're not errors, they're warnings. Your output should probably be fine.

To translate vcftools command to bcftools, you will need to read through the bcftools documentation and figure out how to translate the logic, not just look into equivalent flags. Options for --not-chr and --remove-indels seem pretty apparent from my cursory glance, did you look into bcftools view options at all?

ADD REPLY
0
Entering edit mode

--remove-indels --not-chr --minDP --max-meanDP --mac --maf --minGQ --minQ --max-missing Are these all available in bcftools?

yes, see the -i argument. https://samtools.github.io/bcftools/bcftools.html#expressions

ADD REPLY
0
Entering edit mode

Ok, great. I will take a closer look. I just talked to some colleagues though and it seems that almost all of them are still using vcftools. So it seems to be quite commonly used still. So getting a sense of how severe the above warnings are could still be useful to many.

ADD REPLY
0
Entering edit mode

It's got some legacy use, sure, but it will not keep up with newer VCF versions. Switch to bcftools before you end up introducing silent errors to data that you can no longer go back and correct.

ADD REPLY

Login before adding your answer.

Traffic: 1900 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6