Question: Using VCFtools to obtain fasta files
gravatar for severalorks
3.1 years ago by
severalorks90 wrote:

I would like to take a vcf file and a reference genome from the 1000Genomes project, and obtain a fasta file that lists the genomes for each individual in the vcf, according to the SNPs each individual has in the vcf file. I was wondering if VCFtools is able to do this? If not, what tools are available that can accomplish this?

I have written a python script that goes through the 84 million SNPs in the file and outputs a fasta file. I've tested it by running it on 10000 SNPs and it gives an output after several hours. However, I've tried running it for 84 million SNPs and it has been running for several days now. I'm looking for a more efficient way to obtain a fasta file from .vcf.

I am looking to skip indels.

EDIT: VCFtool's vcf-to-tab converts a .vcf file into a tab file, and then there's a script that turns tab into a fasta file.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by severalorks90
gravatar for Pierre Lindenbaum
3.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:

FastaAlternateReferenceMaker ?

ADD COMMENTlink written 3.1 years ago by Pierre Lindenbaum122k

I believe that's what I'm looking for, I'll look into it

ADD REPLYlink written 3.1 years ago by severalorks90

I looked into it and it works well for obtaining the alternate genome, but I'm looking for the sequences for each individuals in the vcf file. For example, the vcf files gives the SNPs for individuals HG00097 and HG00099, and I'd like to get the sequences for each individual. Additionally, I'd like to skip indels, if it's possible. So for I've checked using vcf-consensus but it's given an error 'Broken VCF header', and i'm not entirely sure if it'll output what I need. Is there a program that can do this?

ADD REPLYlink written 3.1 years ago by severalorks90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 693 users visited in the last hour