Entering edit mode
9.7 years ago
jack
▴
980
Hi,
I want to do the Gene enrichment using Ontologizer without a predefined association file(it's not model organism).
I have parsed a file with two columns for that organism like this:
geneA GO:0006950,GO:0005737
geneB GO:0016020,GO:0005524,GO:0006468,GO:0005737,GO:0004674,GO:0006914,GO:0016021,GO:0015031
geneC GO:0003779,GO:0006941,GO:0005524,GO:0003774,GO:0005516,GO:0005737,GO:0005863
geneD GO:0005634,GO:0003677,GO:0030154,GO:0006350,GO:0006355,GO:0007275,GO:0030528
Also I have downloaded the .ob file from Gene ontology file which contain this information (http://www.geneontology.org/doc/GO.terms_and_ids):
!
! GO IDs (primary only) and name text strings
! GO:0000000 [tab] text string [tab] F|P|C
! where F = molecular function, P = biological process, C = cellular component
!
GO:0000001 mitochondrion inheritance P
GO:0000002 mitochondrial genome maintenance P
GO:0000003 reproduction P
GO:0000005 ribosomal chaperone activity F
GO:0000006 high affinity zinc uptake transmembrane transporter activity F
GO:0000007 low-affinity zinc ion transmembrane transporter activity F
GO:0000008 thioredoxin F
GO:0000009 alpha-1,6-mannosyltransferase activity F
GO:0000010 trans-hexaprenyltranstransferase activity F
GO:0000011 vacuole inheritance P
what I need as output is .gaf file in the following format (in the format of the files at here: http://geneontology.org/page/download-annotations):
!gaf-version: 2.0
!Project_name: Leishmania major GeneDB
!URL: <a href="http://www.genedb.org/leish" target="_blank">http://www.genedb.org/leish</a>
!Contact Email: <a href="mailto:mb4@sanger.ac.uk" target="_blank">mb4@sanger.ac.uk</a>
GeneDB_Lmajor LmjF.36.4770 LmjF.36.4770 GO:0003723 PMID:22396527 ISO GeneDB:Tb927.10.10130 F mitochondrial RNA binding complex 1 subunit, putative LmjF36.4770 gene taxon:347515 20120910 GeneDB_Lmajor
GeneDB_Lmajor LmjF.36.4770 LmjF.36.4770 GO:0044429 PMID:20660476 ISS C mitochondrial RNA binding complex 1 subunit, putative LmjF36.4770 gene taxon:347515 20100803 GeneDB_Lmajor
GeneDB_Lmajor LmjF.36.4770 LmjF.36.4770 GO:0016554 PMID:22396527 ISO GeneDB:Tb927.10.10130 P mitochondrial RNA binding complex 1 subunit, putative LmjF36.4770 gene taxon:347515 20120910 GeneDB_Lmajor
GeneDB_Lmajor LmjF.36.4770 LmjF.36.4770 GO:0048255 PMID:22396527 ISO GeneDB:Tb927.10.10130 P mitochondrial RNA binding complex 1 subunit, putative LmjF36.4770 gene taxon:347515 20120910 GeneDB_Lmajor
Can someone help me with a bash script to do this?
which organism are you working with?
Hello, I also want to do the same work. So, how to resolve this problem of you?
If your genes are already annotated and all you need is put these annotations in the GO annotation file format, the format is described here. Use your favorite scripting language to convert from what you have to this format. You probably only need the first 9 columns.
Also have a look at GeneSCF, if it works for you.
Note: Check this question regarding non-model organism and GeneSCF usage.