I'm trying to retrieve genotype data for a given sample from the 1000 genomes FTP repository, following their guidelines: http://www.1000genomes.org/faq/how-do-i-get-sub-section-vcf-file
So I tried to execute something like: tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr1.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz 1 | perl /nfs/1000g-work/G1K/work/bin/vcftools/perl/vcf-subset -c NA12890
This command is giving me an error at vcf-subset like: Wrong number of fields; expected 1101, got 559. The offending line was: [...]
At first sight you may think that it has got to do with the remote VCF file, but the line where it fails is different every execution and looks like it is cutting these lines. Anybody has the same problem?
Seems to me like a problem with tabix maybe cutting the lines when network is not working very well, but it is just a guess... Any other ideas?
PS: I'm gonna try downloading this huge files but I did not want to do this...