VCF allele frequency is not sensible
0
1
Entering edit mode
6.0 years ago
wrab425 ▴ 50

I have a vcf file containing the variants from 64 haploid strains. I need to be able to get the allele frequencies for further work using plink but when I run the vcftools command

vcftools --vcf merged_SNPs_only.vcf --freq --out freq_analysis

I get this, The problem is I have variants that are in more than one strain but the output suggests that the frequency of the allele is 0 or 1

What is wrong with my vcf or my command?

CHROM   POS     N_ALLELES       N_CHR   {ALLELE:FREQ}
chr1    4446    2       1       C:0     T:1
chr1    4570    2       1       T:0     C:1
chr1    4575    2       1       T:0     C:1
chr1    4595    2       1       A:0     C:1
chr1    4676    2       1       A:0     G:1
chr1    4726    2       2       T:0     C:1
chr1    4865    2       7       A:0     G:1
chr1    4870    2       7       T:0     C:1
chr1    4883    2       7       A:0     T:1
chr1    4952    2       4       G:0     T:1
chr1    5015    2       4       G:0     A:1
chr1    5098    2       7       A:0     C:1
chr1    5168    2       6       G:0     A:1
chr1    5252    2       7       A:0     T:1
chr1    5253    2       4       T:0     A:1
chr1    5265    2       7       C:0     T:1
chr1    5270    2       7       G:0     T:1
chr1    5302    2       2       T:0     G:1
vcftools vcf plink allele frequencies • 1.9k views
ADD COMMENT
0
Entering edit mode

Does your VCF file have all 64 samples in it? Also, the plink tag is not relevant here as plink is not involved in your current situation.

ADD REPLY
0
Entering edit mode

Thanks Ram, my file does have all 64 samples in it. I put the plink in there because I get the same result when I look at allele frequencies using plink. so the problem is illustrated with a snp like chr1 5265 2 7 C:0 T:1 which has seven chromosomes out of the 64 in the set but the T allele frequency is 1 when by my understanding should be 7/64.

ADD REPLY
0
Entering edit mode

What's your vcftools version?

ADD REPLY
0
Entering edit mode

Hi Ram the version: 0.1.15

ADD REPLY
0
Entering edit mode

Yeah, the problem lies with your input file. Please follow my instructions on the gist excerpt so we can understand this better.

ADD REPLY
0
Entering edit mode

super thanks Ram, I have entered the command and put it into a gist but if it would help to see more of the file please ask. We seem to be about to make progress

ADD REPLY
0
Entering edit mode

It would help if we had a few lines where you expect to see a different output than what you see. For example, chr1:5265. I'm not sure if my grep was badly written (I didn't test it), but it's supposed to have produced 10 lines. Remove the -m1 and replace the -A11 with -C11, and put that in the gist if you can please? The output of that along with the line for chr1:5265 should be really helpful.

ADD REPLY
0
Entering edit mode

Actually, if plink is giving you the same output, the problem probably lies with the data being piped in to the tools. Are you working on the file directly or is any other tool involved? Can you paste an except of your VCF in a GitHub gist and paste the link to the gist in your post please? Run this to get the excerpt:

grep -m1 -A11 "^#CHROM" vcf_file >file_to_be_pasted_to_a_github_gist
ADD REPLY
0
Entering edit mode
chr1    4446    .   C   T   128 PASS    AC=1;AF=1.00;AN=1;BaseQRankSum=-0.566;ClippingRankSum=-0.382;DP=11;FS=0.000;MLEAC=1;MLEAF=1.00;MQ=26.01;MQRankSum=0.978;QD=11.64;ReadPosRankSum=0.000;SF=2;SOR=1.802    GT:GQ:DP:PL:AD  .:.:.:.:.   .:.:.:.:.   1:99:11:.,.,.:3,8   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.

and

ADD REPLY
0
Entering edit mode

This agrees with the frequency, in that only one sample has a variant at this locus.

ADD REPLY
0
Entering edit mode

Hi Ram,

We do appreciate your advice as we are stuck with this. The thing is that there are snps that are present in 30 or so of the samples and they also give a MAF of 0 so this seems wrong. If you look at the second gitgist there are some snps in that sample which are present in seven samples and of course they have a MAF of 0. I think it must be because the vcf is haploid but I am not sure hw to fix it.

Thanks,

William

ADD REPLY
0
Entering edit mode

Your gist has only one row(chr1:4446) in it. I don't see a second gist.

EDIT: OK, you haven't linked it here. The gist is

It's weird - I guess the haploid explanation fits - I've never seen VCF with a single value in GT instead of a x/y or x|y.

ADD REPLY

Login before adding your answer.

Traffic: 1523 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6