Question: Extracting super-population allele frequencies from 1000 genomes (phase 3) data
2
gravatar for gokberk
3 months ago by
gokberk60
gokberk60 wrote:

Hi all,

I'm interested in super-population level allele frequencies of SNPs on a gene. I cropped my variants of interest and made a smaller vcf file. Now, I'd like to extract the allele frequencies for these variants.

I've been following the instructions here, even though it gives a "micro-populations" level solution. However, I receive a bunch of warnings and an empty vcf file when I run the following command:

vcf-subset -c CEU.samples.list gene_of_interest_variants.vcf.gz | fill-an-ac |
    bgzip -c > CEU.myGene.vcf.gz

Here are the warnings:

No such column in the VCF file: "NA20526"
 at /usr/local/bin/vcf-subset line 21.
    main::error('No such column in the VCF file: "NA20526"\x{a}') called at /usr/local/bin/vcf-subset line 111
    main::check_columns('HASH(0x1ee51a8)', 'Vcf4_1=HASH(0x1eed3f0)', 'ARRAY(0x2384a78)') called at /usr/local/bin/vcf-subset line 133
    main::vcf_subset('HASH(0x1ee51a8)') called at /usr/local/bin/vcf-subset line 12
Broken VCF header, no column names?
 at /usr/local/share/perl/5.18.2/Vcf.pm line 172.
    Vcf::throw('Vcf4_2=HASH(0x21e5a90)', 'Broken VCF header, no column names?') called at /usr/local/share/perl/5.18.2/Vcf.pm line 867
    VcfReader::_read_column_names('Vcf4_2=HASH(0x21e5a90)') called at /usr/local/share/perl/5.18.2/Vcf.pm line 602
    VcfReader::parse_header('Vcf4_2=HASH(0x21e5a90)') called at /usr/local/bin/fill-an-ac line 45
    main::fill_an_ac(undef) called at /usr/local/bin/fill-an-ac line 9

Is it basically because my down-sampled vcf is not well formatted? If so, is there a better way to crop the variants that I'm interested in? (I used vcftools with --from-bp & --to-bp options)

Any help is much appreciated. Cheers - Gökberk

1000genomes vcf • 163 views
ADD COMMENTlink written 3 months ago by gokberk60
1

As I knew, you don't need to do it by yourself. Gnomad have done it in his VCF files and you just need to extract this information.

https://gnomad.broadinstitute.org/downloads

ADD REPLYlink modified 3 months ago • written 3 months ago by Shicheng Guo7.8k
1

Thanks Shicheng. I've just managed to do it using bcftools query. In case someone needs it, here is the command I used:

bcftools query -f '%CHROM %POS %EAS_AF %AMR_AF %AFR_AF %EUR_AF %SAS_AF \n' myFile.bcf
ADD REPLYlink written 3 months ago by gokberk60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1795 users visited in the last hour