I have to align a series of protein sequences to Pfam seed alignments to subsequently calculate the degree of conservation of certain regions of my protein sequences. I am working with Pfam seed alignments rather than full alignments because I have to do this for over a thousand families and some full alignments are too heavy.
So far I did the following for one of my families:
- Download the unaligned seed and the hmm from Pfam
- Merge the sequences of my proteins with the sequences of the unaligned seed
- Run: hmmalign -o outputfile --trim hmmfile seqfile (the seqfile contains both the the sequences of my proteins and the sequences of the unaligned seed)
The alignment I get is quite good overall but I can see some differences (e.g. some gaps appear, some disappear) when comparing it to the ALIGNED SEED from Pfam,
Is the procedure I'm following correct? I'm new in the field and would very much like to have an expert opìnion on this.
Thank you Juan