I used standalone Interproscan5.14-53.0-64 for functional annotation of proteins predicted in my assembled genome. It works wonderfully and is quick. However, since I am working with whole genome sequences, the output is very large.
I want to sort the protein sequences according to their function i.e. domain and calculate the no. of sequences in each class/ domain type to represent my results in graphical form. Filtering the sequences according to domain is possible in excel but filtering one by one and calculating thousands of sequences manually is exhausting and will take weeks.
I am not a computer science person, so can anyone help me find a solution to automate this work. Are there any command lines which I can use? Any help will be appreciated.