Removing insertions from Stockholm format multiple sequence alignment file
11 weeks ago
becko • 0

I would like to remove all insertion columns from a multiple sequence alignment in Stockholm format (https://en.wikipedia.org/wiki/Stockholm_format). Are there any tools / scripts out there that facilitate this task?

11 weeks ago
Mensur Dlakic ★ 21k

It depends on what exactly you want. To create a gapless FASTA file? There is a utility called esl-reformat in HMMer package that can do that. Also a Perl script reformat.pl in HH-suite.

As I said, I want to remove insertion columns. These are annotated as ~ or . in the secondary-structure annotation in Stockholm format. This is not the same as removing all gaps: I want to preserve deletions.

Unfortunately I have not found a utility in HMMer or Infernal that let's me do this. But maybe I'm missing something. Thanks!

I don't think what you want can be done without a reference sequence. In most cases the first sequence in the alignment -- usually a query that was used to collect all sequences -- serves as a reference, and all the insertion columns with regard to reference can be removed. I don't know of a program that can do so based on SS annotation in Stockholm or any other format.

After compiling HMMer, esl-reformat will be in easel/miniapps subdirectory.