Gene Ontology & Interpro
2
0
Entering edit mode
6.3 years ago
stackf01 ▴ 20

Hello guys. How do I download the complete data sets for protein entries containing information about the GO (such as Biological Process, Molecular Function, Cellular Component) ? I want to download all this data sets and integrate it in a MySQL db.

Furthermore, second question is that how do I complete data sets from InterPro (domain) which a contains fields about super-family, family, sub-family? Which file should I download there?

gene interpro; • 1.7k views
0
Entering edit mode
6.3 years ago

and transform it with the following XSLT stylesheet:

e.g with only one entry.

\$ rm -f tmp.sqlite3 && curl "http://www.uniprot.org/uniprot/O35516.xml" | xsltproc uniprot2sqlite.xsl - | sqlite3 tmp.sqlite3 && sqlite3 tmp.sqlite3 'select * from entry; select * from entry2go;'

1|O35516|NOTC2_MOUSE

1|GO:0009986
1|GO:0005929
1|GO:0005829
1|GO:0005576
1|GO:0005887
1|GO:0016020
1|GO:0005654
1|GO:0005634
1|GO:0005886
1|GO:0043235
1|GO:0005509
1|GO:0019899
1|GO:0051059
1|GO:0060413
1|GO:0046849
1|GO:0007050
1|GO:0001709
1|GO:0016049
1|GO:1990705
1|GO:0061073
1|GO:0042742
1|GO:0007368
1|GO:0030326
1|GO:0072104
1|GO:0072015
1|GO:0001947
1|GO:0072574
1|GO:0006959
1|GO:0001701
1|GO:0002437
1|GO:0072602
1|GO:0035622
1|GO:0070986
1|GO:0001889
1|GO:0072576
1|GO:0002011
1|GO:0035264
1|GO:0043011
1|GO:0008285
1|GO:0000122
1|GO:0007219
1|GO:0009887
1|GO:0060674
1|GO:0001890
1|GO:0043065
1|GO:0030513
1|GO:0008284
1|GO:0045672
1|GO:0046579
1|GO:0072014
1|GO:0003184
1|GO:0006357
1|GO:0006351
1|GO:0042060

0
Entering edit mode
6.3 years ago
me ▴ 740

You can download this information directly using the uniprot web service at www.uniprot.org

Use the customize columns button to select which columns you want to download.

Then select a tab or comma separated download (select compressed as well for best results)

You might want to write a script to use offset and limit to page through the results as it will generate a largish files.

Unlike the answer using XML from FTP this will give all current Gene Ontology Annotations not just those made by the UniProt consortium, at the time of the UniProt release. i.e. can be a bit more information than the XML file has.

0
Entering edit mode

For the UniProt, how do I which one is the parent node of the ontology ?