Question: Extracting super-population allele frequencies from 1000 genomes (phase 3) data
gravatar for gokberk
3 months ago by
gokberk60 wrote:

Hi all,

I'm interested in super-population level allele frequencies of SNPs on a gene. I cropped my variants of interest and made a smaller vcf file. Now, I'd like to extract the allele frequencies for these variants.

I've been following the instructions here, even though it gives a "micro-populations" level solution. However, I receive a bunch of warnings and an empty vcf file when I run the following command:

vcf-subset -c CEU.samples.list gene_of_interest_variants.vcf.gz | fill-an-ac |
    bgzip -c > CEU.myGene.vcf.gz

Here are the warnings:

No such column in the VCF file: "NA20526"
 at /usr/local/bin/vcf-subset line 21.
    main::error('No such column in the VCF file: "NA20526"\x{a}') called at /usr/local/bin/vcf-subset line 111
    main::check_columns('HASH(0x1ee51a8)', 'Vcf4_1=HASH(0x1eed3f0)', 'ARRAY(0x2384a78)') called at /usr/local/bin/vcf-subset line 133
    main::vcf_subset('HASH(0x1ee51a8)') called at /usr/local/bin/vcf-subset line 12
Broken VCF header, no column names?
 at /usr/local/share/perl/5.18.2/ line 172.
    Vcf::throw('Vcf4_2=HASH(0x21e5a90)', 'Broken VCF header, no column names?') called at /usr/local/share/perl/5.18.2/ line 867
    VcfReader::_read_column_names('Vcf4_2=HASH(0x21e5a90)') called at /usr/local/share/perl/5.18.2/ line 602
    VcfReader::parse_header('Vcf4_2=HASH(0x21e5a90)') called at /usr/local/bin/fill-an-ac line 45
    main::fill_an_ac(undef) called at /usr/local/bin/fill-an-ac line 9

Is it basically because my down-sampled vcf is not well formatted? If so, is there a better way to crop the variants that I'm interested in? (I used vcftools with --from-bp & --to-bp options)

Any help is much appreciated. Cheers - Gökberk

1000genomes vcf • 163 views
ADD COMMENTlink written 3 months ago by gokberk60

As I knew, you don't need to do it by yourself. Gnomad have done it in his VCF files and you just need to extract this information.

ADD REPLYlink modified 3 months ago • written 3 months ago by Shicheng Guo7.8k

Thanks Shicheng. I've just managed to do it using bcftools query. In case someone needs it, here is the command I used:

bcftools query -f '%CHROM %POS %EAS_AF %AMR_AF %AFR_AF %EUR_AF %SAS_AF \n' myFile.bcf
ADD REPLYlink written 3 months ago by gokberk60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1795 users visited in the last hour