Filtering down 1000 Genomes data just for a set of SNPs
1
0
Entering edit mode
8.6 years ago
devenvyas ▴ 740

I have some SNP data, and I want to download 1000 Genomes VCF files, so that I can isolate out ancestral alleles for my sites. I know the data slicer tool can help me filter out populations and filter out individuals (to avoid having to download everything). I was wondering if anyone knows of a way to do this except for certain sites in the genome?

(My alternative would be get all 22 VCFs onto HPC and use a script to filter them out... and then use another script to isolate the ancestral allele)

vcf SNP • 2.2k views
ADD COMMENT
2
Entering edit mode
8.6 years ago

if you only need the ancestral alleles, you can get them from the wgs sites file. a fairly simple way of getting them would be to use bcftools query:

bcftools query -f '%CHROM\t%POS\t%ID\t%AA\n' ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz > 1000g.ancestral_alleles.txt

note that the ancestral allele on that last column comes in this format:

##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele. Format: AA|REF|ALT|IndelType. AA: Ancestral allele, REF:Reference Allele, ALT:Alternate Allele, IndelType:Type of Indel (REF, ALT and IndelType are only defined for indels)">

you can add any filtering option (by regions, by rs ids,...) to make this command as complex as needed. you'll be querying the file remotely and retrieving only what you need.

ADD COMMENT
0
Entering edit mode

Thanks! That worked!

ADD REPLY

Login before adding your answer.

Traffic: 1037 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6