Dear community,
I didn't find a direct answer to my question. I am planning to annotate a huge bunge of genomes and metagenomes with profile HMMs like Pfam.
Because of the large size of the data set, I won't be able to optimize individual alignments of the HMMs and I am looking for a good set of parameters for hmmsearch (inclusion parameters) and postprocessing thresholds for e-values and coverage values, i.e. length of the match with respect to the length of the domain model. Could you please cite or tell me infos about finding values that are a good compromise!? And how would it be when processing some metagenomes?
Thanks for the support. Let's hope that someone may provide the answers. I wanted to add that I will combine PFAM and SMART HMMs with custom build HMMs and I would like to have some general literature or review on how one should choose thresholds depending on the nature of modelled domains/proteins, without making it too difficult.
Hello, I am facing the exact same problem and I am struggling quite a lot to find any reference with an explicit set of parameter I could use. Did you get any relevant answer for this post? Thanks a lot
I don't know the answer but I am pretty much looking forward to one. Great question!