I want to run some tests on the 1000 Genomes project Phase 3 VCF files:
I downloaded the files from here: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/
From these files I only want to keep biallelic SNPs so I want to use the --max-alleles options on vcftools. I've been reading the supporting information to find only SNPs, but I couldn't find a list of them anywhere.
Here's how the file looks:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA18525 NA18526 22 16050075 . A G 100 PASS . GT 0|0 0|0
EXTRA: Also on the description of the file there is supposed to be information on the ancestral state for each SNP, but I couldn't find that information on the vcf files either.
Here's that line:
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele. Format: AA|REF|ALT|IndelType. AA: Ancestral allele, REF:Reference Allele, ALT:Alternate Allele, IndelType:Type of Indel (REF, ALT and IndelType are only defined for indels)">