Hello,
I have a set of protein sequences which I'd like to assign to pre-computed gene families e.g. those existing in Ensembl or Phytozome. The output I'm looking for is a gene family ID per protein sequence, or in case of novel sequences, some indication that they do not belong to any known family.
Is there an easy way to achieve this, like a feature in one of these databases or an external tool?
If not, can you suggest ways to perform this analysis? I guess I could just BLAST my sequences against all proteins in the DB and simply assign them to the family of the best hit (after applying some cutoffs on the alignment quality), but is that good enough?
Any advice would be appreciated, thanks!
https://onlinelibrary.wiley.com/doi/full/10.1002/pld3.191