Introducing Known Mutations (From A Vcf) Into A Fasta File
11.1 years ago
Travis ★ 2.8k

Hi,

This could probably be coded easily enough but I don't want to reinvent the wheel.

Is anyone aware of software that will take a FASTA file and a corresponding VCF file and introduce the mutations from the VCF into the FASTA sequence?

how do you manage the overlapping mutations and the heterozygous mutations ?

this would make a decent code golf challenge

Alternatively,maybe use FastG.

I don't - I was hoping someone else did :) I might pull something together myself but for initial simplicity I would ignore both cases you mention. Neither is important to the downstream testing in my application. I could envisage selecting overlapping mutations randomly. The heterozygous bit would be more complicated. As I said though - neither is important to my particular application.

11.1 years ago

This is a duplicate question.

You want the following, from GATK:

java -Xmx2g -jar GenomeAnalysisTK.jar \
-R MY_REFERENCE.fa \
-T FastaAlternateReferenceMaker \
-o MY_REFERENCE_WITH_SNPS_FROM_VCF.fa \
--variant MY_VCF_IN_VCF_4.0_FORMAT.vcf


Thanks David. I had actually searched for previous questions on the topic and retrieved nothing! Cheers.

11.1 years ago
Aaron H ▴ 170

I wrote a script that assumes no overlapping mutations, all biallelic, and tosses the heterozygous sites. Also depends on biopython but just for reading a fasta file so easy to get rid of.

https://github.com/aihardin/utils/blob/master/vcf2fasta.py

9.7 years ago

There is a very well written tool for it. Its called Personnel Genome Constructor.

http://alleleseq.gersteinlab.org/tools.html