Need pairwise protein dataset with features and GO terms
0
0
Entering edit mode
5.4 years ago
tyhhs1991 • 0

Currently, I'm doing a research to develop a deep learning method to predict of two proteins have similar function given the features of the two proteins.

To build this deep learning model, I need a proper dataset to train it, the requirements of the dataset are:

1 contains enough protein pairs with quite a number of pairwise features

2 each protein appears in this dataset with known GO terms(using GO terms to calculate semantic similarity of two proteins as label to train the model)

is there any dataset can meet my demands?


what's more, now, I only found a dataset here: http://mine5.ics.uci.edu:1026/gain.html

it was generated from Lindahl's dataset with pairwise features, but without GO term annotation, 

Total number of unique proteins: 976
Total number of query-template pairs: 951600 

some of the proteins' name like these

1chl-d1chl
1tnfa-d1tnfa
1eac-d1eaf
1gdha-d1gdha2
2avia-d2avia
3pgm-d3pgm
1brnl-d1brnl
1pgga-d1prha2

....

I don't what does the name format means, it's like combination of PDB and SCOP

If I use this dataset, how can I find the GO terms of each proteins in the this dataset

Thanks!

bioinformatics function Go terms protein • 1.5k views
ADD COMMENT
0
Entering edit mode

Did you try to find orthologs of your query protein? If you find Orthologs then try to look for GO term from GOA database.

ADD REPLY

Login before adding your answer.

Traffic: 2175 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6