Assigning gene ontologies to a big set of protein sequences
0
0
Entering edit mode
6.7 years ago
lionel.u.l • 0

I'm currently performing automatic annotations on a large set of genomes (~50) and I'd like to add gene ontologies to these annotations.

I was pretty sure I could do that with PANTHR using InterProScan, providing a protein sequence; but the output was not what I expected.

What I need is something like a [gene, GO_tag] mapping, but instead I got a [gene, PANTHR_tag] which i don't know how to transform to a Gene Ontology in a large scale (~1 million protein sequences).

Interpro Output extract

**Sjp_0023430**     002f3d754e77128356196b17d714e0d1        768     PANTHER **PTHR23255**               97      680     1.7E-260        T       09-08-2017
**Sjp_0023430**     002f3d754e77128356196b17d714e0d1        768     PANTHER **PTHR23255:SF54**          97      680     1.7E-260        T       09-08-2017
**Sjp_0096970**     5c91a96644aeebcfb4811c28470d722a        518     PANTHER **PTHR24242:SF35**          20      366     3.3E-117        T       09-08-2017
**Sjp_0096970**     5c91a96644aeebcfb4811c28470d722a        518     PANTHER **PTHR24242**               20      366     3.3E-117        T       09-08-2017
**Sjp_0019830**     b204ab2339eacc5cfeb797f2b334ed74        303     PANTHER **PTHR21539**               6       291     9.7E-94 T       09-08-2017
**Sjp_0095290**     d7aaca32385a20be5c2d54e5c581215a        582     PANTHER **PTHR24028**               109     573     8.0E-110        T       09-08-2017
**Sjp_0095290**     d7aaca32385a20be5c2d54e5c581215a        582     PANTHER **PTHR24028:SF47**          109     573     8.0E-110        T       09-08-2017
**Sjp_0081580**     77104ef6e08357213850e9071dacd5d9        1055    PANTHER **PTHR24028**               1       1033    2.1E-190        T       09-08-2017
**Sjp_0081580**     77104ef6e08357213850e9071dacd5d9        1055    PANTHER **PTHR24028:SF47**          1       1033    2.1E-190        T       09-08-2017

Is there any way I can do this?

Thanks a lot in advance!

gene ontologies • 1.1k views
ADD COMMENT
0
Entering edit mode

As far as I can tell, IDs of the form PTHR24242 refer to Panther families. Panther families are defined by Panther and don't necessarily match other resources definitions. If you need GO annotations then use GO terms, not something else that may or may not have a GO equivalent.

ADD REPLY
0
Entering edit mode

That would be great, actually. Is there any tool I can use to make Protein to GO Terms annotations locally and high-thoughtput?

ADD REPLY
0
Entering edit mode

If applicable in your case, you could transfer annotations by orthology or simply by homology. It seems that the throughput in your case could be a matter of parallelization.

ADD REPLY

Login before adding your answer.

Traffic: 1821 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6