How to convert amino acid sequences (big fasta files) into nucleotide sequences, any software tool? I'm using a mac. In the fasta files I have the frames too. Here's a little example:
>abc_frame=-1
SEETQLVPLGWPR*W*PWCLSPSRKTSLDLWHSNTQQCLQAAHSVHLESQFCWKCLSRY*
TCSLMNLCRMYIQ*ISFQSTPVLFLQAV*SNLCSSHQENKRPDR*WSDVDLAAQSQRSAV
STVHPSHMIQLPTAAELQETWFVLNLTCCE
>def_frame=-2
QRKHSWSLWGGRGDGDHGAFPPVVKTPIDSQYWHSNTQQCLQAAHSVHLESQFCWKCLSR
Y*TCSLMNLCSMKLQ*ISFQSTPVLFLQAV*SKL*SSHQENKRPDR*WSDVDLAAQSQRS
AVSTDHPSHMIQLPTAAELQETWFVLNLTCC
>ghi_frame=3
SQHVRFSTNHVSCSSAAVGSWIXCEG*TVDTADLCDCAARSTSDHHLSGLLFSW*LLXXX
DQTACRKRTGVDWNEIYWSFILQRFIKEQVQYRLRHFQQNCDSKWTECAA*RHCCVLLCQ
PTGRGIXGFRLLGKRHTGNSVISHPKGTNCVSS
Additional info:
I have a fasta file with multiple nucleotide sequences, which I then translated into amino acids with seqkit. From those amino acid sequences, I filtered a few, based on some parameters, and I would like to have in another fasta file the amino acid sequences that were discarded (nonprod_seq.fasta
) and convert them back to nucleotides.
Here's my code:
seqkit translate nucleotide.fasta --frame 6 --append-frame -o protein.fasta
pip install biopython
from Bio import SeqIO
seqs = SeqIO.parse(open('protein.fasta'), 'fasta')
def is_productive(s):
return 'M' in s.seq[:] and (s.count('*')<2)
with open('prod_seq.fasta', 'w') as fw1, open('nonprod_seq.fasta', 'w') as fw2:
for s in seqs:
SeqIO.write(s, fw1 if is_productive(s) else fw2, 'fasta')
Is there any way to recover these sequences as nucleotides from the initial nucleotide.fasta file? Maybe from the SeqID? The problem is that the SeqID is not the same, as long as, in the protein.fasta file has the frame added at the end, and there are no blanks in the SeqIDs.
Example of SeqIDs in both files:
nucleotide.fasta
>GAACACGAAGGACGC|PRCONS=Primer_Read1_Rod_Cmu|SEQORIENT=F|CREGION=Inner_Scomax_Cmu|CONSCOUNT=19|DUPCOUNT=8
protein.fasta
>GAACACGAAGGACGC|PRCONS=Primer_Read1_Rod_Cmu|SEQORIENT=F|CREGION=Inner_Scomax_Cmu|CONSCOUNT=19|DUPCOUNT=8_frame=1
Thank you in advance
The sequences in your example look like amino acid sequences already. In general, you should look for the process called translation, and you will need only two tools for performing the task: either transeq or getOrf. They are part of EMBOSS and can be installed on a Mac.
Hello, I made a mistake. I just edited the question, sorry. I need to convert amino acid sequences with frames into nucleotide sequences. Thank you