Question: Features of Protein Fuctions
0
gravatar for bzamith26
2.6 years ago by
bzamith2610
Brazil
bzamith2610 wrote:

Hello!

I need to extract features of protein functions. They are organized in a hierarchy, so one way that I thought about solving this issue was representing each node of this hierarchy as a vector containing its path from root. Something like this: https://imgur.com/a/gmvTv

I would like to know if anyone knows another way of extracting features of protein functions, hopefully something more related to biology, but I accept any suggestion. Thank you really much!

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by bzamith2610
1

What kind of functions are you interested in ? Is it Gene Ontology biological process annotations ? Are you trying to derive feature vectors representing protein functions ? What are you trying to achieve with these features ?

ADD REPLYlink written 2.6 years ago by Jean-Karim Heriche22k

Hi Jean! Thank you for your reply. I want to use machine learning to classify protein functions, but making use of interaction data... So I would need both proteins and protein functions described as a vector of features (which I only have for proteins). I want to use Gene Ontology database and FunCat as well, both hierarchical.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by bzamith2610
1

It's still not entirely clear how you plan on using the data. Do you want to use GO and FunCat as input or for validation ? What are the interaction data you want to use ? Regardless, consider that not all machine learning algorithms require a vector representation. For example, many algorithms can make use of kernels (e.g. support vector machines) and computing kernels doesn't always require vectors. For examples of kernels derived from a variety of data types (including GO annotations), look at this paper of mine and at this tutorial.

ADD REPLYlink written 2.6 years ago by Jean-Karim Heriche22k

[..................]

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by bzamith2610

I don't know the predictive bi-clustering tree algorithm, could you share a reference ? The problem with feature-based representations is to find features that are relevant to the problem at hand but also contain useful information. In the case of GO, you could simply create a binary vector representing all functions you care about. As for interaction data, you could use the rows of the graph adjacency matrix as vectors.

ADD REPLYlink written 2.6 years ago by Jean-Karim Heriche22k
1

Here and here you have good references about PCTs (Predictive Clustering Trees). Bi-Predictive Clustering Trees are a new idea, and I know a few papers but they are under revision. Once they get published, I will update this!

"As for interaction data, you could use the rows of the graph adjacency matrix as vectors." = Great suggestion! I'll definitely consider that. Thanks!

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by bzamith2610

Thanks for the links.

ADD REPLYlink written 2.6 years ago by Jean-Karim Heriche22k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2331 users visited in the last hour