Make a specific format gene ontology file?
1
2
Entering edit mode
6 weeks ago
CXL ▴ 20

Hello all, I'm new to ontology, and I'm stuck on a getting a input file when trying to repeat a paper's modeling process.,wasting so much time but not making progress. This is from the paper https://pubmed.ncbi.nlm.nih.gov/33096023/

The file is here https://github.com/idekerlab/DrugCell/blob/public/data/drugcell_ont.txt

The description for this file on github is as following.

  • A tab-delimited file that contains the ontology (hierarchy) that defines the structure of a branch of a DrugCell model that encodes the genotypes. The first column is always a term (subsystem or pathway), and the second column is a term or a gene. The third column should be set to "default" when the line represents a link between terms, "gene" when the line represents an annotation link between a term and a gene. The following is an example describing a sample hierarchy.

Here's I copied a few lines of that file

GO:0007005 GO:0007006 default

GO:0007005 GO:0008637 default

GO:0006281 GO:0006284 default

GO:2000811 PIK3CA gene

GO:2001240 IL1A gene

GO:2001240 SGK3 gene

I checked things on http://geneontology.org/ but didn't find a way to output such file format.

Can someone help me? Thank you in advance, any help will be highly appreciated.

ontology • 345 views
ADD COMMENT
1
Entering edit mode
6 weeks ago
tomas4482 ▴ 280

I highly recommend using R package ClusterProfiler as it implemented hypergeometric test for GO, KEGG and custom geneset.

Two functions enrichGO and enricher both suit your issue.

I'll provide an example.

library(org.Hs.eg.db)
# Optional: convert symbol to entrez ID #
golist<-AnnotationDbi::select(org.Hs.eg.db,keys = your_geneset$gene_symbol,
                                        columns = c("ENTREZID","SYMBOL"),keytype = "SYMBOL")

golist<-na.omit(golist)

allgo<-enrichGO(your_geneset$gene_symbol,OrgDb = org.Hs.eg.db, ont = "ALL", pAdjustMethod = "BH",
                          pvalueCutoff = 0.05,qvalueCutoff = 0.2,keyType = "SYMBOL")

The object allgo contains a dataframe, which is arranged as you wanted.

To get an example dataframe, you can go to gene ontology website and paste their sample geneset for enrichment first. Then go back to above R scripts and run hypergeometric test for this geneset.

ADD COMMENT
0
Entering edit mode

Thank you for your help. But the dataframe in allgo doesn't seem to contain the tree structure like the example file I pastes. Or I didn't find it? What that files contain is the connection between parent term and children term/gene with 3 columns. 1st column is parent term, 2nd column is children term or gene, 3rd column is default(term) or gene

ADD REPLY
1
Entering edit mode

I read the publication you cited. You are looking for hieracchy structured data. The code above does not help with this question.

For your purpose, there are two bioinformatic tools available. goatools and GOfunR

ADD REPLY

Login before adding your answer.

Traffic: 1142 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6