I am trying to group peptide sequences according to the protein functions for further filtering and analysis, and I wonder what is the best and the most comprehensive source of protein function annotations. I downloaded the NCBI RefSeq and UniProt (both Swiss and Trembl) separately with annotations that were provided, but there are a few other (some of which are a lot smaller) sets of peptide sequences and annotations available around various sources online. Is there some centralized location with almost all protein function annotations available from where I could get the entries in a text, MySQL, or some other format? Also, it would be great to be able to cross-reference to map the matching entries to my existing NCBI RefSeq and UniProt entries.
Any advice would be appreciated
PS. I am aware of existing posts covering a similar topic, but I did not see any recent comments there. So I'm hoping, with the new post, to get up to date.
InterProScan: https://www.ebi.ac.uk/interpro/
GOA ? https://www.ebi.ac.uk/GOA
I agree with InterProScan. If you want a database search type functionality, check out
HHSuite
, particularlyhhsearch
andhhpred
etc.Thanks a lot for proposing the sources! I looked at both, and from what I could see, they both primarily based on UniProt entries. I have downloaded UniProt Swiss and Trembl earlier from (https://www.uniprot.org/downloads). Do the UniProt annotations include GOA and/or InterPro annotations, or the InterPro and GOA sets have more than UniProt releases do? Is InterPro more complete than GOA? The two seem to be supported by different entities. Does InterPro include what GOA has, or should I try to get both?