I wonder whether it's better to remove weakly aligned parts of proteins from MSA or keep them for building HMM? Case: Let's say I have a bunch of homologs and I want to generate HMM (hidden Markov-model) to be able to detect their homologs from distinct species. Questions:
- Shall I use all available homologs or there is some reasonable limit (min: 5 or 15? max: 50, 100, 200)? I keep in mind that alignment gets worse the more sequnce is incorporate, plus MSA software has their limitations as well.
- Which MSA program will you recommend? Personally, I like MUSCLE a lot, but I'm aware MAFFT or T-Coffee perform better (but slower).
- Or shall I use more aligners and used consistency based alignment (M-coffee)?
- Shall I trim badly align fragments (trimAl or gBlocks)?