Question: Calculate population allele frequencies from a vcf file including multiple populations
1
gravatar for gwenola.tosser
7 weeks ago by
gwenola.tosser10 wrote:

I have a vcf file with about 800 individuals (diploids) and millions of SNPs. The individuals can be divided in 15 to 25 populations. I would like to calculate the allele frequencies for each SNP on each population. Has someone got a R script doing this? Thank you

snp R • 257 views
ADD COMMENTlink modified 7 weeks ago by chrchang5234.9k • written 7 weeks ago by gwenola.tosser10

With millions of SNPs, it is better to use bcftools.

ADD REPLYlink written 7 weeks ago by zx87547.1k
1
gravatar for Vitis
7 weeks ago by
Vitis2.1k
New York
Vitis2.1k wrote:

I found BGT is a very convenient tool for slicing and querying genotypes from large VCF files. With the sliced genotypes (either by regions or by samples, such as by individuals in different populations), it should be straightforward to calculate allele frequencies for any variants in each population.

https://github.com/lh3/bgt

Or you could directly tap into the VCF file using pyvcf and fetch sample and genotype information for your allele frequency calculations.

https://pyvcf.readthedocs.io/en/latest/

ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by Vitis2.1k
1
gravatar for chrchang523
7 weeks ago by
chrchang5234.9k
United States
chrchang5234.9k wrote:
plink2 --vcf [VCF path] --freq --pheno [population-file path] --loop-cats [name of population col.]

(https://www.cog-genomics.org/plink/2.0/ ) should work, if your population file has sample IDs in the first column and population labels in the second.

If the population file has no header line, use "PHENO1" as the --loop-cats argument. If there is a header line, it may be necessary to change the header for the sample ID column to "#IID".

ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by chrchang5234.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1456 users visited in the last hour