Features of Protein Fuctions
0
0
Entering edit mode
4.1 years ago
bzamith26 ▴ 10

Hello!

I need to extract features of protein functions. They are organized in a hierarchy, so one way that I thought about solving this issue was representing each node of this hierarchy as a vector containing its path from root. Something like this: https://imgur.com/a/gmvTv

I would like to know if anyone knows another way of extracting features of protein functions, hopefully something more related to biology, but I accept any suggestion. Thank you really much!

protein function protein classification features • 1.0k views
ADD COMMENT
1
Entering edit mode

What kind of functions are you interested in ? Is it Gene Ontology biological process annotations ? Are you trying to derive feature vectors representing protein functions ? What are you trying to achieve with these features ?

ADD REPLY
0
Entering edit mode

Hi Jean! Thank you for your reply. I want to use machine learning to classify protein functions, but making use of interaction data... So I would need both proteins and protein functions described as a vector of features (which I only have for proteins). I want to use Gene Ontology database and FunCat as well, both hierarchical.

ADD REPLY
1
Entering edit mode

It's still not entirely clear how you plan on using the data. Do you want to use GO and FunCat as input or for validation ? What are the interaction data you want to use ? Regardless, consider that not all machine learning algorithms require a vector representation. For example, many algorithms can make use of kernels (e.g. support vector machines) and computing kernels doesn't always require vectors. For examples of kernels derived from a variety of data types (including GO annotations), look at this paper of mine and at this tutorial.

ADD REPLY
0
Entering edit mode

[..................]

ADD REPLY
0
Entering edit mode

I don't know the predictive bi-clustering tree algorithm, could you share a reference ? The problem with feature-based representations is to find features that are relevant to the problem at hand but also contain useful information. In the case of GO, you could simply create a binary vector representing all functions you care about. As for interaction data, you could use the rows of the graph adjacency matrix as vectors.

ADD REPLY
1
Entering edit mode

Here and here you have good references about PCTs (Predictive Clustering Trees). Bi-Predictive Clustering Trees are a new idea, and I know a few papers but they are under revision. Once they get published, I will update this!

"As for interaction data, you could use the rows of the graph adjacency matrix as vectors." = Great suggestion! I'll definitely consider that. Thanks!

ADD REPLY
0
Entering edit mode

Thanks for the links.

ADD REPLY

Login before adding your answer.

Traffic: 1032 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6