Question

Assigning gene ontologies to a big set of protein sequences

0

Entering edit mode

6.8 years ago

lionel.u.l • 0

I'm currently performing automatic annotations on a large set of genomes (~50) and I'd like to add gene ontologies to these annotations.

I was pretty sure I could do that with PANTHR using InterProScan, providing a protein sequence; but the output was not what I expected.

What I need is something like a [gene, GO_tag] mapping, but instead I got a [gene, PANTHR_tag] which i don't know how to transform to a Gene Ontology in a large scale (~1 million protein sequences).

Interpro Output extract

**Sjp_0023430**     002f3d754e77128356196b17d714e0d1        768     PANTHER **PTHR23255**               97      680     1.7E-260        T       09-08-2017
**Sjp_0023430**     002f3d754e77128356196b17d714e0d1        768     PANTHER **PTHR23255:SF54**          97      680     1.7E-260        T       09-08-2017
**Sjp_0096970**     5c91a96644aeebcfb4811c28470d722a        518     PANTHER **PTHR24242:SF35**          20      366     3.3E-117        T       09-08-2017
**Sjp_0096970**     5c91a96644aeebcfb4811c28470d722a        518     PANTHER **PTHR24242**               20      366     3.3E-117        T       09-08-2017
**Sjp_0019830**     b204ab2339eacc5cfeb797f2b334ed74        303     PANTHER **PTHR21539**               6       291     9.7E-94 T       09-08-2017
**Sjp_0095290**     d7aaca32385a20be5c2d54e5c581215a        582     PANTHER **PTHR24028**               109     573     8.0E-110        T       09-08-2017
**Sjp_0095290**     d7aaca32385a20be5c2d54e5c581215a        582     PANTHER **PTHR24028:SF47**          109     573     8.0E-110        T       09-08-2017
**Sjp_0081580**     77104ef6e08357213850e9071dacd5d9        1055    PANTHER **PTHR24028**               1       1033    2.1E-190        T       09-08-2017
**Sjp_0081580**     77104ef6e08357213850e9071dacd5d9        1055    PANTHER **PTHR24028:SF47**          1       1033    2.1E-190        T       09-08-2017

Is there any way I can do this?

Thanks a lot in advance!

gene ontologies • 1.1k views

ADD COMMENT • link updated 6.8 years ago by GenoMax 142k • written 6.8 years ago by lionel.u.l • 0

0

Entering edit mode

As far as I can tell, IDs of the form PTHR24242 refer to Panther families. Panther families are defined by Panther and don't necessarily match other resources definitions. If you need GO annotations then use GO terms, not something else that may or may not have a GO equivalent.

ADD REPLY • link 6.8 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

That would be great, actually. Is there any tool I can use to make Protein to GO Terms annotations locally and high-thoughtput?

ADD REPLY • link 6.8 years ago by lionel.u.l • 0

0

Entering edit mode

If applicable in your case, you could transfer annotations by orthology or simply by homology. It seems that the throughput in your case could be a matter of parallelization.

ADD REPLY • link 6.8 years ago by Jean-Karim Heriche 27k