How to generate a multiple sequence alignment that retains the annotations from the original sequences?
1
0
Entering edit mode
2.7 years ago
DNAlias ▴ 40

I want to generate a multiple sequence alignment that retains the annotations I made on a list of protein sequences (eg. Stockholm file format). Is there a way to do this? Maybe with Biopython or Biostrings?

In the end, all I want is a MSA with the same annotations as the orignal sequences, so any way that I can add annotations to a MSA without annotations will work as well.

sequence alignment • 1.8k views
2
Entering edit mode
2.7 years ago
Mensur Dlakic ★ 20k

Muscle can do this. If you save your alignment in aligned FASTa (.afa) format, which is default, all sequence headers will be preserved. Assuming your starting file is protein.fas:

muscle -in protein.fas -out protein.afa


After that the alignment can be converted to Stockholm format using HMMer's esl-reformat utility:

esl-reformat stockholm protein.afa > protein.sto

0
Entering edit mode

My files are currently in genbank format, is there a way to transfer them to fasta while retaining the annotations so that they are compatible with MUSCLE?

0
Entering edit mode

Also, these are annotations I made on domains in the sequence as opposed to the sequence as a whole

0
Entering edit mode

My sequences are currrently in Geneious. So I have an annotation table, but I can only export the file with annotations to genbank

0
Entering edit mode

The old version of esl-reformat is sreformat, and it can convert GenBank to FASTa.

sreformat fasta genome.gbk > genome.fas


You will need to go to older HMMer version, I think v2.3, to find this program.

If this doesn't work, Google is your friend. There should be plenty of programs or scripts to convert GenBank to FASTa.