Question: categorising proteins into families when you have amino acid sequences
5.0 years ago
United States
How can i categorise my proteins into families , when i only have amino acid sequences, i have looked all over the internet for tools or any thing i could use. Mostly the advise I.Ds, but these are newly sequenced genes.

Any ideas?


Well i have just sequenced and assembled this bacteria strain, so i used basys annotation pipeline to annotate the scaffold, here i am with a set of 6000 proteins, but i need to know how many of these are involved in for instance dna metabolism, carbohydrate metabolism, etc i have used blastp to identify some interesting genes. i realised another pipe line mg rast but i have already used some of the genes annotated by the previous pipeline in a publication as not its very hard to change to a different pipeline, otherwise advise.

5.0 years ago
If you have only the amino acid sequences of the proteins, there is not much you can do. You may try to categorize your proteins on Pfam, Prosite, InterPro, SuperFamily, CATH. Also, Blasp may help as well.

edit: based on your update question, may I suggest the RAST server for annotation? It will generate the summaries you want. I never used BASys, but from its description it also outputs the information you want, you probably just have to parse the annotation. For example, the GO ID for "DNA metabolic process" is GO:0006259, you just have to find and count how many genes have this tag on the annotation.

Hello, the pipeline i used generates a file with Go terms per genes as in "", but it does not tell how many would be a a certain group like Dna metabolism, i consulted with the guys at the BASYS pipeline and they said i would have to use a script to do that. am a beginner at scripting, would you have any idea how to go about this approach?

5.0 years ago
Cambridge, US
Start here How To Cluster Sequences Based On Blast Results? and look into CLANS, that might be what you are looking for. Blast2GO is possibly an alternative.
