I have files with multiple sequence alignment of proteins in the following format for any given file:
Species one, gene x
(Protein sequence of gene x)
Species two, gene gene y
(Protein sequence of gene y)
Species three, gene z
(Protein sequence of gene z)
Now I also have whole CDS files for all the species involved:
so, species_one_cds.fa, species_two_cds.fa, species_three_cds.fa
I need something that can read the headers in the multiple sequence alignment of proteins and detect those headers in the CDS fasta files and generate a cds equivalent of the protein equivalent. So something like the following as the final product:
Species one, gene x
(CDS sequence of gene x)
Species two, gene gene y
(CDS sequence of gene y)
Species three, gene z
(CDS sequence of gene z)
Is there a software package that can do something like this?