In my previous question, I asked: I would like to take a vcf file and a reference genome from the 1000Genomes project, and obtain a fasta file that lists the genomes for each individual in the vcf, according to the SNPs each individual has in the vcf file. Answers showed that GATK (FastaAlternativeReferenceMarker) and vcftools (vcf-consensus, http://www.1000genomes.org/faq/are-there-any-fasta-files-containing-1000-genomes-variants-or-haplotypes/) were able to do something similar to this. However, I'd like to skip indels when creating these sequences. Do existing tools have options to do this?
You might try just removing INDELs from your VCF file before passing to these tools, but you could also consider looking at the
FastaVariant class in pyfaidx. See A: Make fasta file from SNPs in two vcf files for an example.
FastaVariant approach is currently too slow for whole chromosome access.