Question: Assigning gene ontologies to a big set of protein sequences
0
gravatar for lionel.u.l
9 weeks ago by
lionel.u.l0
lionel.u.l0 wrote:

I'm currently performing automatic annotations on a large set of genomes (~50) and I'd like to add gene ontologies to these annotations.

I was pretty sure I could do that with PANTHR using InterProScan, providing a protein sequence; but the output was not what I expected.

What I need is something like a [gene, GO_tag] mapping, but instead I got a [gene, PANTHR_tag] which i don't know how to transform to a Gene Ontology in a large scale (~1 million protein sequences).

Interpro Output extract

**Sjp_0023430**     002f3d754e77128356196b17d714e0d1        768     PANTHER **PTHR23255**               97      680     1.7E-260        T       09-08-2017
**Sjp_0023430**     002f3d754e77128356196b17d714e0d1        768     PANTHER **PTHR23255:SF54**          97      680     1.7E-260        T       09-08-2017
**Sjp_0096970**     5c91a96644aeebcfb4811c28470d722a        518     PANTHER **PTHR24242:SF35**          20      366     3.3E-117        T       09-08-2017
**Sjp_0096970**     5c91a96644aeebcfb4811c28470d722a        518     PANTHER **PTHR24242**               20      366     3.3E-117        T       09-08-2017
**Sjp_0019830**     b204ab2339eacc5cfeb797f2b334ed74        303     PANTHER **PTHR21539**               6       291     9.7E-94 T       09-08-2017
**Sjp_0095290**     d7aaca32385a20be5c2d54e5c581215a        582     PANTHER **PTHR24028**               109     573     8.0E-110        T       09-08-2017
**Sjp_0095290**     d7aaca32385a20be5c2d54e5c581215a        582     PANTHER **PTHR24028:SF47**          109     573     8.0E-110        T       09-08-2017
**Sjp_0081580**     77104ef6e08357213850e9071dacd5d9        1055    PANTHER **PTHR24028**               1       1033    2.1E-190        T       09-08-2017
**Sjp_0081580**     77104ef6e08357213850e9071dacd5d9        1055    PANTHER **PTHR24028:SF47**          1       1033    2.1E-190        T       09-08-2017

Is there any way I can do this?

Thanks a lot in advance!

gene ontologies • 183 views
ADD COMMENTlink modified 9 weeks ago by genomax34k • written 9 weeks ago by lionel.u.l0

As far as I can tell, IDs of the form PTHR24242 refer to Panther families. Panther families are defined by Panther and don't necessarily match other resources definitions. If you need GO annotations then use GO terms, not something else that may or may not have a GO equivalent.

ADD REPLYlink written 9 weeks ago by Jean-Karim Heriche13k

That would be great, actually. Is there any tool I can use to make Protein to GO Terms annotations locally and high-thoughtput?

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by lionel.u.l0

If applicable in your case, you could transfer annotations by orthology or simply by homology. It seems that the throughput in your case could be a matter of parallelization.

ADD REPLYlink written 9 weeks ago by Jean-Karim Heriche13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1730 users visited in the last hour