Entering edit mode
5.3 years ago
Kay
▴
10
I have a protein alignment file in fasta format and nucleotide sequences of the original proteins. How can I sort my nucleotide sequences such that their IDs match that of the alignment file, Assuming my protein sequences looks like:
>Seq1
MYHLGQGDPEPDIN
>Seq2
MYGGQGDPEPYIN
>Seq3
MYHLHQGDPEPD
>Seq4
TYHLGQGDPEPDIN
Here are the corresponding nucleotides:
>Seq1
ACTTTTGATACAATTAACAGGACGAAAATAATAGAAAAGCTAAAGCATCTTAGAATCCCA
>Seq2
AATCCCAGACAAATTAAGACATATTCTAACAGTGAGTCTACAGAACACAGAACACTATAG
>Seq3
AGTTTTGCAATGGTAAATTATTTTGAAGAGTTTATAGGTCGTGTCTGGAACTGCAATTAT
>Seq4
TGGAATATTAGACGAATTCCATACACAGCACCTATTGTAATATTCATAGATTTCAAAAGC
Assuming my protein alignment looks like this:
>Seq4
MYHLG-QGDPEPDIN
>Seq2
MYGGQG-DPEPY-IN
>Seq1
MYHLHQ--GDPEP-D
>Seq3
--TYHLGQGDPEPDIN
How do I get my nucleotide sequences to now look like:
>Seq4
TGGAATATTAGACGAATTCCATACACAGCACCTATTGTAATATTCATAGATTTCAAAAGC
>Seq2
AATCCCAGACAAATTAAGACATATTCTAACAGTGAGTCTACAGAACACAGAACACTATAG
>Seq1
ACTTTTGATACAATTAACAGGACGAAAATAATAGAAAAGCTAAAGCATCTTAGAATCCCA
>Seq3
AGTTTTGCAATGGTAAATTATTTTGAAGAGTTTATAGGTCGTGTCTGGAACTGCAATTAT
Especially when I have many sequences thanks