I have a number of protein datasets, that I applied my machine learning method (ProFET) to, for classification (i.e. using machine learning).
The reviewers want a comparison of our methods performance, vs PSI-BLAST.
I have various sets of protein sequences, each in its own multi-fasta file, with each file corresponding to a functional group or class (e.g. "Neuropeptide" or "Not Neuropeptide").
I to do: all vs all BLAST/Psi-blast on the data, then making a number of clusters corresponding to the number of classes, then seeing how well the clusters correspond to each class. (I'll be doing this with a binary classification test case).
I've never used BLAST locally, and I don't know any tools for doing this quickly. I just need to get the all vs all blast, get clusters from the distance matrix, then get the assignments to the clusters (and preferably the statistics).
My whole pipeline is with scikit-learn / python. (i'm a programming n00b).
Anything simple and fast would be great. Emphasis on simple, I just need this as a one-off.
Thank you very much!