Question: How To Get The Go Associated With A Protein?
4
gravatar for Sirus
7.9 years ago by
Sirus770
Boston/USA
Sirus770 wrote:

Hello every body, I am computer scientist and I have started working on the bioinformatics field but I am having trouble finding resources. So my problem is as follow, I have a protein-protein interaction network and I want to find for each protein the list of protein associated with it, I have seen that Bioconductor Packages have tools that can help calculate it. Any one has an idea how to do it? Thank you in advance

gene R ppi bioconductor • 2.3k views
ADD COMMENTlink modified 7.7 years ago by Pierre Lindenbaum112k • written 7.9 years ago by Sirus770
9
gravatar for Neilfws
7.9 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

If you have a list of identifiers (such as protein sequence IDs), you want another list of identifiers (such as GO terms) and you're working with a commonly-used organism (such as humans), then BioMart is a good option.

The IDs that you have are termed "filters", those that you want are termed "attributes". You can use BioMart via the web interface and there are also other tools; in particular the R biomaRt package.

Here's a brief example, showing how you could connect human protein HGNC symbols with GO:

library(biomaRt)
mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
hgnc <- c("EPB41L3", "RAB31", "TUBB6", "ADAMTS1", "CFD", "CLDN8")
# query biomart
results <- getBM(attributes = c("hgnc_symbol", "go_biological_process_id"), filters = "hgnc_symbol", values = hgnc, mart = mart)
results

   hgnc_symbol go_biological_process_id
1      EPB41L3               GO:0008150
2      EPB41L3               GO:0030866
3        RAB31               GO:0015031
4        RAB31               GO:0007264
5        RAB31               GO:0048193
6        RAB31               GO:0006886
7        RAB31               GO:0006913
8        RAB31               GO:0007165
9        TUBB6               GO:0007018
10       TUBB6               GO:0051258
11       TUBB6               GO:0007017
12     ADAMTS1               GO:0006508
13     ADAMTS1               GO:0001542
14     ADAMTS1               GO:0060347
15     ADAMTS1               GO:0007229
16     ADAMTS1               GO:0001822
17     ADAMTS1               GO:0008285
18     ADAMTS1                         
19       CLDN8               GO:0016338
20         CFD               GO:0006957
21         CFD               GO:0006956
22         CFD               GO:0006508
23         CFD               GO:0007219

You can use the biomaRt functions listAttributes() and listFilters() to see the available options. For example, to see the attributes related to GO:

a <- listAttributes()
a[grep("GO", a$description),]
24                  go_biological_process_id     GO Term Accession (bp)
25                                 name_1006          GO Term Name (bp)
26                           definition_1006    GO Term Definition (bp)
27        go_biological_process_linkage_type GO Term Evidence Code (bp)
28                  go_cellular_component_id     GO Term Accession (cc)
29       go_cellular_component__dm_name_1006          GO Term Name (cc)
30 go_cellular_component__dm_definition_1006    GO Term Definition (cc)
31        go_cellular_component_linkage_type GO Term Evidence Code (cc)
32                  go_molecular_function_id          GO Term Accession
33       go_molecular_function__dm_name_1006          GO Term Name (mf)
34 go_molecular_function__dm_definition_1006    GO Term Definition (mf)
35        go_molecular_function_linkage_type GO Term Evidence Code (mf)
36                      goslim_goa_accession    GOSlim GOA Accession(s)
37                    goslim_goa_description     GOSlim GOA Description
ADD COMMENTlink written 7.9 years ago by Neilfws48k

Thank you, it seems that it is the answer that I was looking for, I will try it.

ADD REPLYlink written 7.9 years ago by Sirus770
3
gravatar for Pierre Lindenbaum
7.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum112k wrote:

Using the UCSC mysql server, and using Neil's examples:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg18 -e '
select T.acc,T.name,T.term_type, X.geneSymbol
from go.term as T,
go.goaPart as GOA,
hg18.kgXref as X
where
T.acc=GOA.goId and
GOA.dbObjectSymbol=X.spDisplayId and 
X.geneSymbol in ("EPB41L3", "RAB31", "TUBB6", "ADAMTS1", "CFD", "CLDN8")
'

+------------+---------------------------------------------------------+--------------------+------------+
| acc        | name                                                    | term_type          | geneSymbol |
+------------+---------------------------------------------------------+--------------------+------------+
| GO:0004222 | metalloendopeptidase activity                           | molecular_function | ADAMTS1    | 
| GO:0005178 | integrin binding                                        | molecular_function | ADAMTS1    | 
| GO:0005576 | extracellular region                                    | cellular_component | ADAMTS1    | 
| GO:0005578 | proteinaceous extracellular matrix                      | cellular_component | ADAMTS1    | 
| GO:0006508 | proteolysis                                             | biological_process | ADAMTS1    | 
| GO:0007229 | integrin-mediated signaling pathway                     | biological_process | ADAMTS1    | 
| GO:0008201 | heparin binding                                         | molecular_function | ADAMTS1    | 
| GO:0008233 | peptidase activity                                      | molecular_function | ADAMTS1    | 
| GO:0008237 | metallopeptidase activity                               | molecular_function | ADAMTS1    | 
| GO:0008270 | zinc ion binding                                        | molecular_function | ADAMTS1    | 
| GO:0008285 | negative regulation of cell proliferation               | biological_process | ADAMTS1    | 
| GO:0016787 | hydrolase activity                                      | molecular_function | ADAMTS1    | 
| GO:0031012 | extracellular matrix                                    | cellular_component | ADAMTS1    | 
| GO:0046872 | metal ion binding                                       | molecular_function | ADAMTS1    | 
| GO:0003817 | complement factor D activity                            | molecular_function | CFD        | 
| GO:0003824 | catalytic activity                                      | molecular_function | CFD        | 
| GO:0004252 | serine-type endopeptidase activity                      | molecular_function | CFD        | 
| GO:0005576 | extracellular region                                    | cellular_component | CFD        | 
| GO:0006508 | proteolysis                                             | biological_process | CFD        | 
| GO:0006955 | immune response                                         | biological_process | CFD        | 
| GO:0006956 | complement activation                                   | biological_process | CFD        | 
| GO:0006957 | complement activation, alternative pathway              | biological_process | CFD        |
ADD COMMENTlink written 7.9 years ago by Pierre Lindenbaum112k

How to access the UCSC SQL server?

ADD REPLYlink written 7.9 years ago by Sirus770

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg18

ADD REPLYlink written 7.9 years ago by Pierre Lindenbaum112k

Thank you, I will try it :)

ADD REPLYlink written 7.9 years ago by Sirus770
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1950 users visited in the last hour