Question: Compare allele frequency of SNPs in my data to their allele frequency in different human populations - ExAC vs 1000 Genomes vs Other?
0
gravatar for gaelgarcia05
16 months ago by
gaelgarcia05190
UK
gaelgarcia05190 wrote:

Hello,

I have a list of SNPs (in the form of a VCF) found in our [very] targeted sequencing dataset of ~15,000 individuals.

I am looking to compare the MAFs of these SNPs within this 'population' (our dataset) to their MAFs across different populations, such as the populations defined in ExAC or the 1000 Genomes Project.

Is there an effective way that you would recommend to do this?

These samples were processed using GrCh38 — I believe the ExAC variants have coordinates based on the previous build (please correct me if I'm wrong), so I'm unsure about using the MAFs from the ExAC data.

The output table I have in mind would look something like this:

snpID  MAF_mysamples  MAF_european  MAF_finnish  MAF_african  MAF_se_asian  MAF_asian

As always, your input is greatly appreciated.

ADD COMMENTlink modified 16 months ago • written 16 months ago by gaelgarcia05190
2

Hello,

Ensembl have a lifted over version for 1000 Genomes, ExAC and gnomAD exomes for hg38. Have a look at this ftp directory.

You could use this for annotating your vcf and extract than all the information in a way you like. How should your final output look like?

fin swimmer

ADD REPLYlink modified 16 months ago • written 16 months ago by finswimmer12k

Thanks @finswimmer - just updated my post to clarify what I'm looking for as output.

The output table I have in mind would look something like this:

snpID  MAF_mysamples  MAF_european  MAF_finnish  MAF_african  MAF_se_asian  MAF_asian
ADD REPLYlink written 16 months ago by gaelgarcia05190
2

Just use ANNOVAR, as it outputs allele frequencies for all of these populations, and it supports hg38. It even has a function that converts VCF to the format required for ANNOVAR, to assist you.

Regarding allele frequencies in your own sample cohort, you can just calculate the AF (allele frequency) INFO tag and encode it directly into your VCF using BCFtools: How to use bcftools to calculate AF INFO field from AC and AN in VCF?

To then extract the AF in an 'easy' format, use BCFtools query, something like: A: Extracting certain columns from VCF file

ADD REPLYlink written 16 months ago by Kevin Blighe50k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1869 users visited in the last hour