Question: VCF allele frequency is not sensible
1
gravatar for wrab425
18 months ago by
wrab42510
United Kingdom
wrab42510 wrote:

I have a vcf file containing the variants from 64 haploid strains. I need to be able to get the allele frequencies for further work using plink but when I run the vcftools command

vcftools --vcf merged_SNPs_only.vcf --freq --out freq_analysis

I get this, The problem is I have variants that are in more than one strain but the output suggests that the frequency of the allele is 0 or 1

What is wrong with my vcf or my command?

CHROM   POS     N_ALLELES       N_CHR   {ALLELE:FREQ}
chr1    4446    2       1       C:0     T:1
chr1    4570    2       1       T:0     C:1
chr1    4575    2       1       T:0     C:1
chr1    4595    2       1       A:0     C:1
chr1    4676    2       1       A:0     G:1
chr1    4726    2       2       T:0     C:1
chr1    4865    2       7       A:0     G:1
chr1    4870    2       7       T:0     C:1
chr1    4883    2       7       A:0     T:1
chr1    4952    2       4       G:0     T:1
chr1    5015    2       4       G:0     A:1
chr1    5098    2       7       A:0     C:1
chr1    5168    2       6       G:0     A:1
chr1    5252    2       7       A:0     T:1
chr1    5253    2       4       T:0     A:1
chr1    5265    2       7       C:0     T:1
chr1    5270    2       7       G:0     T:1
chr1    5302    2       2       T:0     G:1
ADD COMMENTlink modified 18 months ago • written 18 months ago by wrab42510

Does your VCF file have all 64 samples in it? Also, the plink tag is not relevant here as plink is not involved in your current situation.

ADD REPLYlink written 18 months ago by RamRS24k

Thanks Ram, my file does have all 64 samples in it. I put the plink in there because I get the same result when I look at allele frequencies using plink. so the problem is illustrated with a snp like chr1 5265 2 7 C:0 T:1 which has seven chromosomes out of the 64 in the set but the T allele frequency is 1 when by my understanding should be 7/64.

ADD REPLYlink written 18 months ago by wrab42510

What's your vcftools version?

ADD REPLYlink written 18 months ago by RamRS24k

Hi Ram the version: 0.1.15

ADD REPLYlink written 18 months ago by wrab42510

Yeah, the problem lies with your input file. Please follow my instructions on the gist excerpt so we can understand this better.

ADD REPLYlink written 18 months ago by RamRS24k

super thanks Ram, I have entered the command and put it into a gist but if it would help to see more of the file please ask. We seem to be about to make progress

ADD REPLYlink written 18 months ago by wrab42510

It would help if we had a few lines where you expect to see a different output than what you see. For example, chr1:5265. I'm not sure if my grep was badly written (I didn't test it), but it's supposed to have produced 10 lines. Remove the -m1 and replace the -A11 with -C11, and put that in the gist if you can please? The output of that along with the line for chr1:5265 should be really helpful.

ADD REPLYlink written 18 months ago by RamRS24k

Actually, if plink is giving you the same output, the problem probably lies with the data being piped in to the tools. Are you working on the file directly or is any other tool involved? Can you paste an except of your VCF in a GitHub gist and paste the link to the gist in your post please? Run this to get the excerpt:

grep -m1 -A11 "^#CHROM" vcf_file >file_to_be_pasted_to_a_github_gist
ADD REPLYlink modified 18 months ago • written 18 months ago by RamRS24k
chr1    4446    .   C   T   128 PASS    AC=1;AF=1.00;AN=1;BaseQRankSum=-0.566;ClippingRankSum=-0.382;DP=11;FS=0.000;MLEAC=1;MLEAF=1.00;MQ=26.01;MQRankSum=0.978;QD=11.64;ReadPosRankSum=0.000;SF=2;SOR=1.802    GT:GQ:DP:PL:AD  .:.:.:.:.   .:.:.:.:.   1:99:11:.,.,.:3,8   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.   .:.:.:.:.

and

ADD REPLYlink modified 18 months ago by genomax74k • written 18 months ago by wrab42510

This agrees with the frequency, in that only one sample has a variant at this locus.

ADD REPLYlink written 18 months ago by RamRS24k

Hi Ram,

We do appreciate your advice as we are stuck with this. The thing is that there are snps that are present in 30 or so of the samples and they also give a MAF of 0 so this seems wrong. If you look at the second gitgist there are some snps in that sample which are present in seven samples and of course they have a MAF of 0. I think it must be because the vcf is haploid but I am not sure hw to fix it.

Thanks,

William

ADD REPLYlink written 18 months ago by wrab42510

Your gist has only one row(chr1:4446) in it. I don't see a second gist.

EDIT: OK, you haven't linked it here. The gist is

It's weird - I guess the haploid explanation fits - I've never seen VCF with a single value in GT instead of a x/y or x|y.

ADD REPLYlink modified 18 months ago • written 18 months ago by RamRS24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1945 users visited in the last hour