Off topic:how to sort protein and nucleotide sequences based on IDs
0
0
Entering edit mode
5.3 years ago
Kay ▴ 10

I have a protein alignment file in fasta format and nucleotide sequences of the original proteins. How can I sort my nucleotide sequences such that their IDs match that of the alignment file, Assuming my protein sequences looks like:

>Seq1
MYHLGQGDPEPDIN
>Seq2
MYGGQGDPEPYIN
>Seq3
MYHLHQGDPEPD
>Seq4
TYHLGQGDPEPDIN

Here are the corresponding nucleotides:

>Seq1
ACTTTTGATACAATTAACAGGACGAAAATAATAGAAAAGCTAAAGCATCTTAGAATCCCA
>Seq2
AATCCCAGACAAATTAAGACATATTCTAACAGTGAGTCTACAGAACACAGAACACTATAG
>Seq3
AGTTTTGCAATGGTAAATTATTTTGAAGAGTTTATAGGTCGTGTCTGGAACTGCAATTAT
>Seq4
TGGAATATTAGACGAATTCCATACACAGCACCTATTGTAATATTCATAGATTTCAAAAGC

Assuming my protein alignment looks like this:

>Seq4
MYHLG-QGDPEPDIN
>Seq2
MYGGQG-DPEPY-IN
>Seq1
MYHLHQ--GDPEP-D
>Seq3
--TYHLGQGDPEPDIN

How do I get my nucleotide sequences to now look like:

>Seq4
TGGAATATTAGACGAATTCCATACACAGCACCTATTGTAATATTCATAGATTTCAAAAGC
>Seq2
AATCCCAGACAAATTAAGACATATTCTAACAGTGAGTCTACAGAACACAGAACACTATAG
>Seq1
ACTTTTGATACAATTAACAGGACGAAAATAATAGAAAAGCTAAAGCATCTTAGAATCCCA
>Seq3
AGTTTTGCAATGGTAAATTATTTTGAAGAGTTTATAGGTCGTGTCTGGAACTGCAATTAT

Especially when I have many sequences thanks

fasta • 779 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6