The meaning of '0' allele frequency in vcftools output
0
0
Entering edit mode
4.1 years ago

Hello!

In the output from vcftools '--freq' option, you could get for example:

CHROM   POS N_ALLELES   N_CHR   {ALLELE:FREQ}
1   861276  2   698 A:1 G:0
1   861292  2   698 C:1 G:0
1   861298  2   698 G:1 A:0
1   861315  2   698 G:1 A:0


It looks wonky because of single tab delimiters, but the point here is that the listed alleles are of frequency 1 and 0. What does that mean? Why mention the second allele if its frequency is 0?

Is it a rounding problem?

I have 349 individuals, so I'm thinking an allele frequency can't be lower than 1/349, which is far from the double precision float rounding limit...

Grateful for some lightshed! :-]

vcf vcftools MAF allele frequency allele • 2.5k views
0
Entering edit mode

Why couldn't it be 0, if they're all homozygous reference?

0
Entering edit mode

Sure but look at the positions. Some positions (e.g. 861277-861291) are skipped, and I take that to mean 'no alleles found'?

0
Entering edit mode

That really depends on how you did the genotyping.

0
Entering edit mode

What exactly does that encompass?

0
Entering edit mode

ie, did you use GATK? Samtools + Freebayes? Something else? Joint Calling? Filter the output dataset? - To try and determine why there are gaps in your report. There may have been a quality filter applied in which the missing bases didn't pass quality thresholds, so it's omitted, but without more information, it's hard to say.

0
Entering edit mode

I don't know how the analysis that produced my source .vcf file was carried out, but the output in my original post is from vcftools.

I'll ask the .vcf file supplier if they know...