I read this excellent post by Stephen on getting data from 1000 genomes with tabix, but it seems to not be working for me. I use tabix to get the data in the following manner:
tabix -fh ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr22.phase1_integrated_calls.20101123.snps_indels_svs.genotypes.vcf.gz 22:1000000-10000000 > ~/delete.vcf
It gets a vcf file, but the file only seems to have headers, no variants info... like so:
##INFO=<ID=SNPSOURCE,Number=.,Type=String,Description="indicates if a snp was called when analysing the low coverage or exome alignment data"> ##reference=GRCh37 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00096...
So after running vcftools, obviously I get nothing since there is not genotype data:
vcftools --vcf ~/delete.vcf --freq --out ~/delete.txt VCFtools - v0.1.7 (C) Adam Auton 2009 Parameters as interpreted: --vcf /home/delahar/delete.vcf --freq --out /home/delahar/delete.txt Reading Index file. File contains 0 entries and 1092 individuals. Applying Required Filters. After filtering, kept 1092 out of 1092 Individuals After filtering, kept 0 out of a possible 0 Sites Error:No data left for analysis!
I'm guessing this is an issue with the way I'm using tabix. Ultimately I want to get the fields that have VT=SV in their column. So extra help on getting that would be greatly appreciated.