Entering edit mode
4.6 years ago
CephBirk
▴
20
I have a .go file from the supplementary material of a paper and am trying to understand the formatting. Here's a sample:
Ocbimv22017828m GO:chromatin binding ; GO:0003682;GO:protein binding ; GO:0005515
Ocbimv22003392m GO:protein binding ; GO:0005515
Ocbimv22036412m GO:DNA binding ; GO:0003677|GO:DNA-directed RNA polymerase activity ; GO:0003899|GO:transcription, DNA-templated ; GO:0006351
Ocbimv22003166m GO:scavenger receptor activity ; GO:0005044|GO:membrane ; GO:0016020
Ocbimv22034134m GO:protein binding ; GO:0005515|GO:zinc ion binding ; GO:0008270
Ocbimv22036284m GO:sequence-specific DNA binding transcription factor activity ; GO:0003700|GO:regulation of transcription, DNA-dependent ; GO:0006355|GO:nucleus ; GO:0005634
Ocbimv22004380m GO:transmembrane transporter activity ; GO:0022857|GO:transmembrane transport ; GO:0055085|GO:integral to membrane ; GO:0016021
The Ocbimv22... are the transcript IDs and each has corresponding GO terms. However, sometimes GO terms are separated by semicolons and sometimes by vertical lines. Is this a standard file format (I'm new to this field)? I've tried contacting the corresponding author but have not heard word back... Does semicolon mean something different than vertical line? Or is it safe to assume they're synonymous?
Doesn't seem a standard format, and in fact it seems kind of messy. Semicolons sometimes are separating the GO accession number from its name, like in
GO:protein binding ; GO:0005515
Other times, semicolons are separating pairs of accessions / names, like in
GO:chromatin binding ; GO:0003682 ; GO:protein binding ; GO:0005515
I think it just means the formatting is wrong, with semicolons meant to separate a GO accession from its name, and pipes (the vertical bars) meant to separate different pairs of accessions / names, but sometimes semicolons were erroneously used.
It would be helpful if you included a link to the paper. Did you read it materials and methods? What did it say about transcriptome annotation? What software was used?
With the exception very first entry (from OP), for a given transcript, it looks like this to me:
for eg. GO term pair with Description and ID : GO:chromatin binding ; GO:0003682
from http://amigo.geneontology.org/amigo/term/GO:0003682, GO:0003682 description is GO:chromatin binding
for eg. GO:chromatin binding ; GO:0003682
for eg. GO:0005515 and GO:0008270 are separated by | by no spaces before after each GO term. GO:protein binding ; GO:0005515|GO:zinc ion binding ; GO:0008270
Eg.