Question: VCF tools --freq output file
0
gravatar for Inquisitive8995
4 weeks ago by
Inquisitive8995170 wrote:

Hello, I am using VCFTools --freq to obtain allele frequencies of my VCF file consisting of 4 individuals. I used the below command to run

vcftools --vcf A.vcf --freq --out A.frq

My output file looks like below:

CHROM   POS     N_ALLELES       N_CHR   {ALLELE:FREQ}

contig_1        279875  2       0       T:-nan  C:-nan

contig_3        277244  3       0       G:-nan  A:-nan  T:-nan

contig_3        277247  2       0       C:-nan  T:-nan

contig_4        8794    2       0       A:-nan  G:-nan

contig_4        78125   2       8       G:0     A:1

contig_4        219961  2       8       G:0     C:1

contig_4        250382  2       8       T:0     C:1

contig_11       123877  2       6       T:0.166667      C:0.833333

I was unable to find the description of the ouput headers for the .frq file. Which allele comes first, Major allele or Minor alelle ? What does -nan signify ? Please let me know if can find this information anywhere.

Any help would be appreciated. Thank you so much!

ADD COMMENTlink modified 4 weeks ago by Kevin Blighe63k • written 4 weeks ago by Inquisitive8995170
0
gravatar for Kevin Blighe
4 weeks ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

Hey,

The header for the output in included in the output itself. For example:

CHROM           POS     N_ALLELES   N_CHR   {ALLELE:FREQ}
contig_11       123877  2           6       T:0.166667  C:0.833333

From this, I can see that there are 2 unique alleles (N_ALLELES) at position contig_11:123877, and these are observed across 6 total alleles (N_CHR) - these have the following frequencies:

  • T:0.166667
  • C:0.833333

My crude mathematics tell me that there is 1 T base, and 5 C bases.

Kevin

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Kevin Blighe63k

Hi, Thanks for your reply. If I go by the calculation, can I confidently say that the first column is MAF for creating a MAF plot. What would I infer from a SNP having three alleles as seen in contig_3:277244. Do have any suggestions about it ? Thank u

ADD REPLYlink written 4 weeks ago by Inquisitive8995170
1

For the multi-allelic site, you may want to remove those, or at least split them - see my answer here: A: Remove duplicate SNPs only based on SNP ID in bcftools

Irrespective of multi-alleles or not, I am not sure, given the history of these programs, that you can have 100% confidence that the first column always relates to the minor allele. I would implement a check via awk in order to detect the minor allele and then take that.

ADD REPLYlink written 29 days ago by Kevin Blighe63k
1

Thank you. I suppose I can use,

bcftools view --max-alleles 2

for retaining positions with only 2 alleles. Then, I will try awk and obtain just the minor allele frequency.

ADD REPLYlink written 29 days ago by Inquisitive8995170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1588 users visited in the last hour