Question: categorising proteins into families when you have amino acid sequences
gravatar for mwanerhi  erfgtr
5.0 years ago by
United States
mwanerhi erfgtr30 wrote:

How can i categorise my proteins into families , when i only have amino acid sequences, i have looked all over the internet for tools or any thing i could use. Mostly the advise I.Ds, but these are newly sequenced genes.

Any ideas?


Well i have just sequenced and assembled this bacteria strain, so i used basys annotation pipeline to annotate the scaffold, here i am with a set of 6000 proteins, but i need to know how many of these are involved in for instance dna metabolism, carbohydrate metabolism, etc i have used blastp to identify some interesting genes. i realised another pipe line mg rast but i have already used some of the genes annotated by the previous pipeline in a publication as not its very hard to change to a different pipeline, otherwise advise.

protein categorising • 1.2k views
ADD COMMENTlink modified 5.0 years ago by Christian2.9k • written 5.0 years ago by mwanerhi erfgtr30
gravatar for h.mon
5.0 years ago by
h.mon30k wrote:

If you have only the amino acid sequences of the proteins, there is not much you can do. You may try to categorize your proteins on Pfam, Prosite, InterPro, SuperFamily, CATH. Also, Blasp may help as well.

edit: based on your update question, may I suggest the RAST server for annotation? It will generate the summaries you want. I never used BASys, but from its description it also outputs the information you want, you probably just have to parse the annotation. For example, the GO ID for "DNA metabolic process" is GO:0006259, you just have to find and count how many genes have this tag on the annotation.

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by h.mon30k


ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by mwanerhi erfgtr30

Hello, the pipeline i used generates a file with Go terms per genes as in "", but it does not tell how many would be a a certain group like Dna metabolism, i consulted with the guys at the BASYS pipeline and they said i would have to use a script to do that. am a beginner at scripting, would you have any idea how to go about this approach?

ADD REPLYlink written 5.0 years ago by mwanerhi erfgtr30
gravatar for Christian
5.0 years ago by
Cambridge, US
Christian2.9k wrote:
Start here How To Cluster Sequences Based On Blast Results? and look into CLANS, that might be what you are looking for. Blast2GO is possibly an alternative.
ADD COMMENTlink written 5.0 years ago by Christian2.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1209 users visited in the last hour