Question: interproscan summary script
4.1 years ago
UK/Harpenden/Rothamsted Research
rob234king570 wrote:

I have interproscan output of a new genome annotation and I've used blast2go to look at the GO terms at different levels but I now want to produce a summary table of the number of proteins with interproscan family domains.


Does anyone have a script or method to summarize for instance table 2 from this journal (I'll try to email them to see if they have a script and post if get it): 


I could just collect a subset of interproscan ID's and do a grep for the intreproscan ID's and count them but wondering if there is a more comperehensive sophisticated method to get all those with family interproscan ID's summarized?



I have downloaded from interproscan their tree relationship file (example given below). The -- are childs of the parent so what I want to do for each parent i.e. IPR015797 sum the number found including the children and sum the children separately. 

IPR015797::NUDIX hydrolase domain-like::
--IPR000086::NUDIX hydrolase domain::
----IPR020476::NUDIX hydrolase::
--IPR029119::MutY, C-terminal::
IPR015812::Integrin beta subunit::
--IPR012013::Integrin beta-4 subunit::
--IPR015436::Integrin beta-6 subunit::
--IPR015437::Integrin beta-7 subunit::
--IPR015439::Integrin beta-2 subunit::
--IPR015442::Integrin beta-8 subunit::
--IPR027067::Integrin beta-5 subunit::
--IPR027068::Integrin beta-3 subunit::
--IPR027070::Integrin beta-like protein 1::
--IPR027071::Integrin beta-1 subunit::


interproscan • 1.5k views
modified 2.2 years ago by cgbm860 • written 4.1 years ago by rob234king570

Have you had any luck? Could you possibly tell me where you found the tree relationships of the interproIDs? Nevermind, found them here.

modified 3.1 years ago • written 3.1 years ago by peter0
2.2 years ago
cgbm860 wrote:

Hello rob, have you sorted out your problem? I want to do the same. Thanks, Cristian

written 2.2 years ago by cgbm860

I use blast2go, we have pro licence and function in it to export as spreadsheet interpro domains then use linux command line to count them

written 2.2 years ago by rob234king570
