Question

How to convert bulk UniProt Id to GO terms/Ids?

3

Entering edit mode

7.5 years ago

mirzaei86.vahid ▴ 50

Hello

I have worked on a transcriptome and I have got UniProt Id from blastx output (near 20K uniprot accessions). In my project I should do GO analysis and pathway analysis for them and I could not use Trinotate because I have done analysis with different software.

How can I extract GO Ids/terms from bulk UniProt accession? and then enrich them?

Thanks

Uniprot RNA-Seq Assembly • 19k views

ADD COMMENT • link updated 3.2 years ago by Pratik ★ 1.0k • written 7.5 years ago by mirzaei86.vahid ▴ 50

Ram · Answer 1 · 2017-02-27

8

Entering edit mode

7.5 years ago

Elisabeth Gasteiger ★ 2.4k

To extract GO terms for a list of UniProtKB identifiers, use the UniProt batch retrieve tool as suggested above, but instead of mapping UniProtKB IDs to an external database, map from UniProtKB to UniProtKB.

Once you have your result, you can click on "Columns" and customize your result table layout, as described here or here.

The customization interface contains a section "Gene Ontology", where you can select to see a complete list, or separate columns for the 3 ontologies molecular function, biological process or cellular component, or a list of identifiers only.

You can remove all columns you are not interested in in this context, and then download the results in tab-delimited format.

Or you can access the UniProt website programmatically, with one query per accession number: for a given UniProtKB identifier, e.g. Q9ZUA2, you can use this URL

http://www.uniprot.org/uniprot/?query=Q9ZUA2&format=tab&columns=id%2Cgo

Please don't hesitate to contact the UniProt helpdesk if you have any additional questions.

ADD COMMENT • link updated 3.2 years ago by Ram 44k • written 7.5 years ago by Elisabeth Gasteiger ★ 2.4k

0

Entering edit mode

@Elisabeth Nice description. As I understood apart from annotation, 'mirzaei86.vahid' also wants to perform enrichment analysis.

ADD REPLY • link 7.5 years ago by EagleEye 7.6k

0

Entering edit mode

Hi Elizabeth, thanks for your help.

ADD REPLY • link 7.5 years ago by mirzaei86.vahid ▴ 50

0

Entering edit mode

Hello Elisabeth Gasteiger,

It was a very helpful explanation that you gave. Could you please help me too by letting me know how could I get complete GO terms by using UNIprot IDs. the whole is mentioned below. The whole list returns the query (UNIprot IDs) as TRUE/FALSE for these headings mentioned below:

GOBP_Biological regulation GOBP_Cellular process GOBP_Developmental process GOBP_Growth GOBP_Immune system process GOBP_Interaction with cells and organisms GOBP_Localization GOBP_Metabolic process GOBP_Regulation GOBP_Reproduction GOBP_Response to stimulus GOBP_Other GOCC_Endosome GOCC_Chromosome GOCC_Ribosome GOCC_Golgi GOCC_ER GOCC_Mitochondria GOCC_Nucleus GOCC_Peroxisome/microbody GOCC_Cytoskeleton GOCC_Plasma membrane GOCC_Cell surface GOCC_Extracellular GOCC_Other intracellular organelles GOCC_Other cytoplasmic vesicle GOCC_Macromolecular complex GOCC_Cytoplasm GOCC_Other GOMF_Antioxidant Activity GOMF_Binding GOMF_Catalytic Activity GOMF_Enzyme regulator activity GOMF_Molecular transducer activity GOMF_Structural molecule activity GOMF_Transcription regulator activity GOMF_Translation regulator activity GOMF_Transporter activity GOMF_Chaperone activity GOMF_Motor activity GOMF_Other

ADD REPLY • link updated 3.2 years ago by Ram 44k • written 6.5 years ago by vipulbatra.pu ▴ 10

0

Entering edit mode

I'm afraid I don't quite understand what you are trying to do. What is your input? Please don't hesitate to contact the UniProt helpdesk with your question (or open a new thread in BioStars).

ADD REPLY • link 6.5 years ago by Elisabeth Gasteiger ★ 2.4k

0

Entering edit mode

I have a list of proteins. Someone helped me with GO results as shown in figure. I want to arrange other lists of proteins in the same format. The 2 pictures are in continuation. ![part1][1]![part2 ][2] I am not sure what tool they have used

https://ibb.co/ntiKan https://ibb.co/dr3Kan

My input is Uniprot IDs. When i map them using uniprot, I get only 3 components of GO, not a complete list as shown in the Image I attached as TRUE/FALSE

ADD REPLY • link 6.5 years ago by vipulbatra.pu ▴ 10

score 2 · Answer 2 · 2021-06-03

I've been doing something similar recently. Here's a way to make a data frame with the UNIPROT ID and also all the gene ontology information, not sure if this is what is needed exactly, then of course you'd have to figure out a way to graph the information...

Download this and extract: http://geneontology.org/gene-associations/goa_human.gaf.gz

I extracted to my desktop. The following should make the data frame.

system("awk 'NR>=42' ~/Desktop/goa_human.gaf > ~/Desktop/goa_human_no_header.txt")
GO <-read.csv("~/Desktop/goa_human_no_header.txt", header=F, sep="\t")

BiocManager::install("GO.db")
library(GO.db)
GOdb <- as.data.frame(GOTERM)
GO$V4 <- NULL
GO$V7 <- NULL
GO$V8 <- NULL
GO$V1 <- NULL
GO$V6 <- NULL
GO$V10 <- NULL
GO$V13 <- NULL
GO$V14 <- NULL
GO$V16 <- NULL
GO$V17 <- NULL
GO$V12 <- NULL
GO$V15 <- NULL
GO$V3 <- NULL
GO$V9 <- NULL
GO$V11 <- NULL
colnames(GO) <- c("UNIPROTID", "GOID")
colnames(GOdb)[1] <- c("GOID")
GOdb <- head(GOdb,-1)
UPwithGO <- merge(GO, GOdb, by = "GOID")
rm(GOdb, GO)
UPwithGO$go_id <- NULL

It's kind of messy to be honest, but I tried lol

score 1 · Answer 3 · 2017-02-25

1

Entering edit mode

7.5 years ago

EagleEye 7.6k

1) Convert your Uniprot Ids to Gene name/HGNC or Gene Id (Entrez ID) using uniprot id mapping.

2) Use Entrez Ids or Gene names (symbols) in GeneSCF for enrichment analysis (KEGG and GO) or annotation.

ADD COMMENT • link 7.5 years ago by EagleEye 7.6k

score 0 · Answer 4 · 2017-02-28

0

Entering edit mode

7.5 years ago

Pallab Bhowmick ▴ 20

You can also use EBI QuickGO tools to fetch GO terms/ID programmatically.

ADD COMMENT • link 7.5 years ago by Pallab Bhowmick ▴ 20

score 0 · Answer 5 · 2017-08-28

Dear mirzaei86.vahid,

you can use the query functions of the python library pyuniprot.

install (with pip or git clone) and update. Find out which taxonomy identifier fits to your organisms. Example here (human, mouse, rat). Don't make a full update for all organisms (takes very long).

Python code:

pyuniprot.update(taxids=[9606, 10090, 10116])

Use following python code for your problem:

if 1433E_HUMAN and A4_HUMAN are the identifiers you are looking for:

Python code:

import pyuniprot
query = pyuniprot.query() 
entries = query.entry(name=('1433E_HUMAN', 'A4_HUMAN'))  
first_accessions = [entry.accessions[0] for entry in entries]
gos = query.db_reference(entry_name=('1433E_HUMAN', 'A4_HUMAN'), type_='GO')
go_ids = [x.identifier for x in gos]

Best regards