Question: extract information from Uniprot
0
gravatar for Learner
9 days ago by
Learner 110
Learner 110 wrote:

I am wondering if anyone knows any program, script that one can use to retrieve over 100 gene information. Basically I want to get the info related to "Biological process", "Molecular function" and "Cellular component"

Thanks a bunch

genome • 178 views
ADD COMMENTlink modified 8 days ago by sammer.kamal9110 • written 9 days ago by Learner 110

Can you explain what your input is? It may be a grep on a file in ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN but I can't tell from your question.

ADD REPLYlink written 9 days ago by Alex Reynolds26k

@Alex Reynolds the input can either be protein name or gene name. for instance, lets use a list of 7 genes from Human

ERVMER34-1
BMP4 
DNAJA1
ELANE
GZMB
RACK1
DNAJB1
ADD REPLYlink written 9 days ago by Learner 110

https://www.uniprot.org/uniprot/?query=gene:BMP4+AND+reviewed:yes+AND+organism:9606#goViewBy
https://www.uniprot.org/uniprot/?query=gene:ELANE+AND+reviewed:yes+AND+organism:9606#goViewBy

Construct others as needed.

ADD REPLYlink modified 9 days ago • written 9 days ago by genomax59k

@genomax this requires to go one by one in the Uniprot and then try to copy and paste the info from there. It is impossible when you have 100 or even more gene . Do you know a better way ?

ADD REPLYlink written 9 days ago by Learner 110

These queries can be programmatically constructed. You will find help from UniProt here. They may also have a downloadble file on FTP site that could be queried. As Alex said other resources may have this information more readily available.

ADD REPLYlink written 9 days ago by genomax59k

Google: retrieve uniprot mapping. Any luck?

Tell us what you have as your identifiers/ file formats. Print the head of your list/file.

ADD REPLYlink written 9 days ago by Biogeek330

@Biogeek I gave an example above. A list of genes and of course I could not find anything in google. Please use the following gene names as example

ERVMER34-1
BMP4 
DNAJA1
ELANE
GZMB
RACK1
DNAJB1

format can be txt, xls or whatever else if needed

ADD REPLYlink modified 9 days ago • written 9 days ago by Learner 110
1
gravatar for Alex Reynolds
8 days ago by
Alex Reynolds26k
Seattle, WA USA
Alex Reynolds26k wrote:

Given a list of IDs:

$ cat /tmp/list.txt 
ERVMER34-1
BMP4 
DNAJA1
ELANE
GZMB
RACK1
DNAJB1

Grab the GAF file of UniProt id-to-GO mappings:

$ wget -qO- ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/goa_human.gaf.gz | gunzip -c > /tmp/goa_human.gaf

Query your list of identifiers:

$ grep -wf /tmp/list.txt /tmp/goa_human.gaf > /tmp/query_results.txt

Use GO.db in R to read in GO data, and read your query results into a data frame to get mapped GO terms:

> library("GO.db")
> go_term_table <- toTable(GOTERM)
> df <- read.table("/tmp/query_results.txt", header=F, fill=T)
> ids <- unique(df$V4)
> unique_go_ids <- ids[grepl("^GO:", ids)]

You can then query the GO term table against your identifiers; for example, for the Biological Process ontology:

> biological_process <- go_term_table[go_term_table$Ontology == "BP" & go_term_table$go_id %in% unique_go_ids, ]

Repeat as needed for the other ontologies. Use write.table and similar to write R results to a file, if needed.

See: http://bioconductor.org/packages/release/data/annotation/html/GO.db.html for information on how to install GO.db.

ADD COMMENTlink written 8 days ago by Alex Reynolds26k

@Alex Reynolds do you know about the "Molecular function" and "Cellular component", I think I should use MF and CC

ADD REPLYlink modified 8 days ago • written 8 days ago by Learner 110

Seems reasonable to use.

ADD REPLYlink written 8 days ago by Alex Reynolds26k

@Alex Reynolds do you know how to understand which info I can extract from go_term_table ? actually I tried to list info using ?go_term_table or help but does not show anything. I also googled it with no success. I would appreciate if you could direct me to some info. basically I want to add the gene name to gene ID , definition etc

ADD REPLYlink modified 8 days ago • written 8 days ago by Learner 110

go_term_table is the name of a variable, so you're not going to get anything out of R from running ?go_term_table.

Run ?toTable if you want to learn about that command, but maybe start with the vignette and then read documentation about specific commands:

• https://www.bioconductor.org/packages/release/bioc/vignettes/annotate/inst/doc/GOusage.pdf

• http://bioconductor.org/packages/release/data/annotation/manuals/GO.db/man/GO.db.pdf

ADD REPLYlink written 8 days ago by Alex Reynolds26k

@Alex Reynolds Thanks for the link . is it possible somehow to keep the information from "query_results" merged with the GO? or at least seeing the gene name ? I think what you get from the first part is the GO ids and then you extract the data from GO.db.

ADD REPLYlink written 7 days ago by Learner 110

Maybe use join functions to connect the go_term_table lookup with results from df (query_results.txt): https://dplyr.tidyverse.org/reference/join.html

I'd think you could join on the GO:xyz identifier, for instance.

ADD REPLYlink modified 7 days ago • written 7 days ago by Alex Reynolds26k
0
gravatar for sammer.kamal91
8 days ago by
sammer.kamal9110 wrote:

U can use UniProt for a list click on retreive/ID mapping https://www.uniprot.org/uploadlists/ 1- enter yr list as a file or a copied text. 2- specify your list identifiers. In case of gene name U optionally can specify a species other wise all species contain these gene name will be included in yr result.

you can control what is in yr results table. U need BP. MF, and CC so you need to edit the columns to view them so tick them from Gene Ontology GO tab.

https://www.uniprot.org/uniprot/?query=yourlist:M201812066746803381A1F0E0DB47453E0216320D06CFD34&sort=yourlist:M201812066746803381A1F0E0DB47453E0216320D06CFD34

ADD COMMENTlink written 8 days ago by sammer.kamal9110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 798 users visited in the last hour