Question: VCF to FASTA
gravatar for page2
6.0 years ago by
United States
page20 wrote:



I need to convert .vcf files from 1000Genomes into FASTA files while maintaining phasing. I currently am doing this more or less manually using excel. We tried using Galaxy but couldn't figure out how to maintain the phasing from the vcf.

I am wondering if there is a fast and easy way to do this that I haven't found?

Unfortunately I don't know any programming, otherwise I'm sure this would be much easier.


Thank you!


vcf fasta • 16k views
ADD COMMENTlink modified 4.6 years ago by castelli0 • written 6.0 years ago by page20

there are several biostars entries addressing this issue: check Introducing Known Mutations (From A Vcf) Into A Fasta File, Standard Genome Plus Vcf To Variant Genome or New Fasta Sequence From Reference Fasta And Variant Calls File? for instance. GATK's FastaAlternateReferenceMaker is what I usually go for.

ADD REPLYlink written 6.0 years ago by Jorge Amigo12k
gravatar for Matt Shirley
6.0 years ago by
Matt Shirley9.4k
Cambridge, MA
Matt Shirley9.4k wrote:

Erik Garrison (freebayes author) has a small tool to make a consensus FASTA using any phased variants in a VCF. Take a look here:

ADD COMMENTlink written 6.0 years ago by Matt Shirley9.4k
gravatar for castelli
4.6 years ago by
castelli0 wrote:

You may use vcfx. It creates a fasta file (two sequences per sample) using a reference sequence and replacing each variable site on the right location. It supports indels. Take a look here:

ADD COMMENTlink written 4.6 years ago by castelli0

Seems like interesting software, but please consider releasing the source code instead of a pre-compiled binary. The readme also states that the license is Creative Commons, so I'm not sure that there's any reason to limit end users to a binary blob. Some people would also complain about the requirement to register using an email address/password (do you really need to register accounts or could you just collect optional emails?) when we already have too many logins to remember. I can't evaluate the functionality of the software either as the publication is not open-access. In light of this I wouldn't trust the results from this program because I have no way to know anything about it's inner workings or quality.

ADD REPLYlink written 4.6 years ago by Matt Shirley9.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1120 users visited in the last hour