Entering edit mode
6.8 years ago
lionel.u.l
•
0
I'm currently performing automatic annotations on a large set of genomes (~50) and I'd like to add gene ontologies to these annotations.
I was pretty sure I could do that with PANTHR using InterProScan, providing a protein sequence; but the output was not what I expected.
What I need is something like a [gene, GO_tag] mapping, but instead I got a [gene, PANTHR_tag] which i don't know how to transform to a Gene Ontology in a large scale (~1 million protein sequences).
Interpro Output extract
**Sjp_0023430** 002f3d754e77128356196b17d714e0d1 768 PANTHER **PTHR23255** 97 680 1.7E-260 T 09-08-2017
**Sjp_0023430** 002f3d754e77128356196b17d714e0d1 768 PANTHER **PTHR23255:SF54** 97 680 1.7E-260 T 09-08-2017
**Sjp_0096970** 5c91a96644aeebcfb4811c28470d722a 518 PANTHER **PTHR24242:SF35** 20 366 3.3E-117 T 09-08-2017
**Sjp_0096970** 5c91a96644aeebcfb4811c28470d722a 518 PANTHER **PTHR24242** 20 366 3.3E-117 T 09-08-2017
**Sjp_0019830** b204ab2339eacc5cfeb797f2b334ed74 303 PANTHER **PTHR21539** 6 291 9.7E-94 T 09-08-2017
**Sjp_0095290** d7aaca32385a20be5c2d54e5c581215a 582 PANTHER **PTHR24028** 109 573 8.0E-110 T 09-08-2017
**Sjp_0095290** d7aaca32385a20be5c2d54e5c581215a 582 PANTHER **PTHR24028:SF47** 109 573 8.0E-110 T 09-08-2017
**Sjp_0081580** 77104ef6e08357213850e9071dacd5d9 1055 PANTHER **PTHR24028** 1 1033 2.1E-190 T 09-08-2017
**Sjp_0081580** 77104ef6e08357213850e9071dacd5d9 1055 PANTHER **PTHR24028:SF47** 1 1033 2.1E-190 T 09-08-2017
Is there any way I can do this?
Thanks a lot in advance!
As far as I can tell, IDs of the form PTHR24242 refer to Panther families. Panther families are defined by Panther and don't necessarily match other resources definitions. If you need GO annotations then use GO terms, not something else that may or may not have a GO equivalent.
That would be great, actually. Is there any tool I can use to make Protein to GO Terms annotations locally and high-thoughtput?
If applicable in your case, you could transfer annotations by orthology or simply by homology. It seems that the throughput in your case could be a matter of parallelization.