Convert VCF to new proteinfasta containing SNVs
1
0
Entering edit mode
7.7 years ago
mosquitoes • 0

Hello,

I am trying to write a new fasta of protein sequences which contains all of the SNVs I have identified from WGS for one strain. I know roughly how to get there, but not the exact tools available.

So far, I have:

  1. Created a new gene fasta using gatk's FastaAlternateReferenceMaker. Using the -L option to only write genes into the fasta.

I know I can use biopython to convert this DNA fasta to an AA fasta, yet all of the genes on the reverse strand are reverse complemented in the new gene fasta. Is there a way to either change the negative strand genes to their reverse complement or tell a program this when it is translating the sequences. I could use the bed file as a reference/dictionary.

Thanks!

gatk biopython python fasta vcf • 2.2k views
ADD COMMENT
0
Entering edit mode

Does the fasta file made by FastaAlternateReferenceMaker provide any information from which strand it has extracted the gene? E.g. are there numbers in the `>' line that make it obvious which strand was used? Can you provide an example of this fasta file?

ADD REPLY
0
Entering edit mode
7.7 years ago
Brice Sarver ★ 3.8k

See the .reverse_complement() method in Biopython.

ADD COMMENT
0
Entering edit mode

Right, but I need to do this for the whole genome, where approximately half of the genes in the current fasta need to be reverse complemented. The only way to know which ones is by looking at the .bed

ADD REPLY
0
Entering edit mode

A BED file is just a tab-delimited file. As you extract the positions for your genes of interest, also extract the strand information. Do a logical evaluation: if "-" then reverse complement the sequence. You'll want to evaluate this for each region you extract, i.e., for each exon.

ADD REPLY

Login before adding your answer.

Traffic: 1826 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6