Question: Filtering down 1000 Genomes data just for a set of SNPs
gravatar for devenvyas
5.1 years ago by
Stony Brook
devenvyas650 wrote:

I have some SNP data, and I want to download 1000 Genomes VCF files, so that I can isolate out ancestral alleles for my sites. I know the data slicer tool can help me filter out populations and filter out individuals (to avoid having to download everything). I was wondering if anyone knows of a way to do this except for certain sites in the genome?

(My alternative would be get all 22 VCFs onto HPC and use a script to filter them out... and then use another script to isolate the ancestral allele)



snp vcf • 1.6k views
ADD COMMENTlink modified 5.1 years ago by Jorge Amigo12k • written 5.1 years ago by devenvyas650
gravatar for Jorge Amigo
5.1 years ago by
Jorge Amigo12k
Santiago de Compostela, Spain
Jorge Amigo12k wrote:

if you only need the ancestral alleles, you can get them from the wgs sites file. a fairly simple way of getting them would be to use bcftools query:

bcftools query -f '%CHROM\t%POS\t%ID\t%AA\n' > 1000g.ancestral_alleles.txt

note that the ancestral allele on that last column comes in this format:

##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele. Format: AA|REF|ALT|IndelType. AA: Ancestral allele, REF:Reference Allele, ALT:Alternate Allele, IndelType:Type of Indel (REF, ALT and IndelType are only defined for indels)">

you can add any filtering option (by regions, by rs ids,...) to make this command as complex as needed. you'll be querying the file remotely and retrieving only what you need.

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Jorge Amigo12k

Thanks! That worked!

ADD REPLYlink written 5.1 years ago by devenvyas650
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1697 users visited in the last hour