Finding only biallelic SNPs from the 1000GP BCF files
0
0
Entering edit mode
7.6 years ago

I want to run some tests on the 1000 Genomes project Phase 3 VCF files:

I downloaded the files from here: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/

From these files I only want to keep biallelic SNPs so I want to use the --max-alleles options on vcftools. I've been reading the supporting information to find only SNPs, but I couldn't find a list of them anywhere.

Here's how the file looks:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NA18525 NA18526 
22  16050075    .   A   G   100 PASS    .   GT  0|0 0|0

EXTRA: Also on the description of the file there is supposed to be information on the ancestral state for each SNP, but I couldn't find that information on the vcf files either.

Here's that line:

##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele. Format: AA|REF|ALT|IndelType. AA: Ancestral allele, REF:Reference Allele, ALT:Alternate Allele, IndelType:Type of Indel (REF, ALT and IndelType are only defined for indels)">
vcf SNP • 2.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 3443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6