I would like to take a vcf file and a reference genome from the 1000Genomes project, and obtain a fasta file that lists the genomes for each individual in the vcf, according to the SNPs each individual has in the vcf file. I was wondering if VCFtools is able to do this? If not, what tools are available that can accomplish this?
I have written a python script that goes through the 84 million SNPs in the file and outputs a fasta file. I've tested it by running it on 10000 SNPs and it gives an output after several hours. However, I've tried running it for 84 million SNPs and it has been running for several days now. I'm looking for a more efficient way to obtain a fasta file from .vcf.
I am looking to skip indels.
EDIT: VCFtool's vcf-to-tab converts a .vcf file into a tab file, and then there's a script that turns tab into a fasta file. https://code.google.com/archive/p/vcf-tab-to-fasta/