Question: Introducing Known Mutations (From A Vcf) Into A Fasta File
6
gravatar for Travis
7.2 years ago by
Travis2.8k
USA
Travis2.8k wrote:

Hi,

This could probably be coded easily enough but I don't want to reinvent the wheel.

Is anyone aware of software that will take a FASTA file and a corresponding VCF file and introduce the mutations from the VCF into the FASTA sequence?

Thanks in advance.

ADD COMMENTlink modified 5.9 years ago by Ashutosh Pandey11k • written 7.2 years ago by Travis2.8k
1

how do you manage the overlapping mutations and the heterozygous mutations ?

ADD REPLYlink written 7.2 years ago by Pierre Lindenbaum120k

this would make a decent code golf challenge

ADD REPLYlink written 7.2 years ago by Jeremy Leipzig18k

Alternatively,maybe use FastG.

ADD REPLYlink written 5.9 years ago by skymningen330

I don't - I was hoping someone else did :) I might pull something together myself but for initial simplicity I would ignore both cases you mention. Neither is important to the downstream testing in my application. I could envisage selecting overlapping mutations randomly. The heterozygous bit would be more complicated. As I said though - neither is important to my particular application.

ADD REPLYlink written 7.2 years ago by Travis2.8k
7
gravatar for David Quigley
7.2 years ago by
David Quigley11k
San Francisco
David Quigley11k wrote:

This is a duplicate question.

You want the following, from GATK:

java -Xmx2g -jar GenomeAnalysisTK.jar \
-R MY_REFERENCE.fa \
-T FastaAlternateReferenceMaker \
-o MY_REFERENCE_WITH_SNPS_FROM_VCF.fa \
--variant MY_VCF_IN_VCF_4.0_FORMAT.vcf

See also this question, this question.

ADD COMMENTlink written 7.2 years ago by David Quigley11k

Thanks David. I had actually searched for previous questions on the topic and retrieved nothing! Cheers.

ADD REPLYlink written 7.2 years ago by Travis2.8k
3
gravatar for Aaron H
7.2 years ago by
Aaron H170
United States/San Francisco/UCSF
Aaron H170 wrote:

I wrote a script that assumes no overlapping mutations, all biallelic, and tosses the heterozygous sites. Also depends on biopython but just for reading a fasta file so easy to get rid of.

https://github.com/aihardin/utils/blob/master/vcf2fasta.py

ADD COMMENTlink written 7.2 years ago by Aaron H170
0
gravatar for Ashutosh Pandey
5.9 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

There is a very well written tool for it. Its called Personnel Genome Constructor.

http://alleleseq.gersteinlab.org/tools.html

ADD COMMENTlink written 5.9 years ago by Ashutosh Pandey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1561 users visited in the last hour