Question: Getting BLAST Clusters Classifier performance for Protein classes
gravatar for ddofer
5.8 years ago by
ddofer30 wrote:

I have a number of protein datasets, that I applied my machine learning method (ProFET) to, for classification (i.e. using machine learning). 
The reviewers want a comparison of our methods performance, vs PSI-BLAST. 
I have various sets of protein sequences, each in its own multi-fasta file, with each file corresponding to a functional group or class (e.g. "Neuropeptide" or "Not Neuropeptide").  

I to do: all vs all BLAST/Psi-blast on the data, then making a number of clusters corresponding to the number of classes, then seeing how well the clusters correspond to each class. (I'll be doing this with a binary classification test case).
I've never used BLAST locally, and I don't know any tools for doing this quickly. I just need to get the all vs all blast, get clusters from the distance matrix, then get the assignments to the clusters (and preferably the statistics). 
My whole pipeline is with scikit-learn / python. (i'm a programming n00b). 

Anything simple and fast would be great. Emphasis on simple, I just need this as a one-off. 

Thank you very much!

ADD COMMENTlink modified 5.8 years ago by mark.ziemann1.3k • written 5.8 years ago by ddofer30
gravatar for mark.ziemann
5.8 years ago by
mark.ziemann1.3k wrote:

Hi Ddofer, you will need to install blast+ from NCBI for your OS which includes PSI-blast and all other BLAST flavours. Check out the docs for more info. Running this blast job on Linux would look like this:

psiblast -db proteins.fa -query proteins.fa -out result.txt -outfmt 6 -max_target_seqs 500 -num_threads 8

Where proteins.fa is your multifasta file, result.txt is your output file, the output format is tabular, the max munber of output sequences is 500, and runs with 8 CPUs. All of these parameters you can modify depending on your system and desired output.


ADD COMMENTlink written 5.8 years ago by mark.ziemann1.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1050 users visited in the last hour