Question: Convert VCF to new proteinfasta containing SNVs
gravatar for mosquitoes
2.8 years ago by
United States
mosquitoes0 wrote:


I am trying to write a new fasta of protein sequences which contains all of the SNVs I have identified from WGS for one strain. I know roughly how to get there, but not the exact tools available.

So far, I have:

  1. Created a new gene fasta using gatk's FastaAlternateReferenceMaker. Using the -L option to only write genes into the fasta.

I know I can use biopython to convert this DNA fasta to an AA fasta, yet all of the genes on the reverse strand are reverse complemented in the new gene fasta. Is there a way to either change the negative strand genes to their reverse complement or tell a program this when it is translating the sequences. I could use the bed file as a reference/dictionary.


python biopython vcf gatk fasta • 974 views
ADD COMMENTlink modified 2.8 years ago by Brice Sarver2.6k • written 2.8 years ago by mosquitoes0

Does the fasta file made by FastaAlternateReferenceMaker provide any information from which strand it has extracted the gene? E.g. are there numbers in the `>' line that make it obvious which strand was used? Can you provide an example of this fasta file?

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Markus250
gravatar for Brice Sarver
2.8 years ago by
Brice Sarver2.6k
United States
Brice Sarver2.6k wrote:

See the .reverse_complement() method in Biopython.

ADD COMMENTlink written 2.8 years ago by Brice Sarver2.6k

Right, but I need to do this for the whole genome, where approximately half of the genes in the current fasta need to be reverse complemented. The only way to know which ones is by looking at the .bed

ADD REPLYlink written 2.8 years ago by mosquitoes0

A BED file is just a tab-delimited file. As you extract the positions for your genes of interest, also extract the strand information. Do a logical evaluation: if "-" then reverse complement the sequence. You'll want to evaluate this for each region you extract, i.e., for each exon.

ADD REPLYlink written 2.8 years ago by Brice Sarver2.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 713 users visited in the last hour