I'm trying to classify reads using libSVM (tetramer frequencies).
I have a trained model but I can't find what the input file format for svm-predict should be; the sequences that will be of unknown origin shouldn't have a label in the beginning of the vector. If I don't put one, then svm-predict prints out "Classification=..." as if it was doing testing of the model and I think that there should be a way to "tell" svm-predict that you're not doing testing but "actual" prediction...
I'm new to libSVM, so please tell me if I'm wrong at some point...
LIBSVM contains 3 programs for three specific applications:
svm-train : Use this program for training your data with class labels.
svm-predict : Once you generate the model use svm-predict with feature vectors as input (no class labels required, the svm-predict with use the model and the input feature vectors and predict the class
svm-scale : This is important important to avoid feature bias. This can be used to scale data to a restricted range as preprocessing for SVM training
I understand that you have already created your model and you are having problem with input file format. This should be in sync with the input files that you have used to generate the model. Usually input file will be a text file with the features derived from sequences.
If you are looking for a tutorial on libsvm, the official tutorial and FAQ are the best.
Does svm-predict ALWAYS output the "Accuracy=xx%" line? In my case it does and it looks like it does testing (and not prediction of unknown data).
If I put no value in the beginning of the line, then it parses the first integer (in my case this is the index of the index:value pair) of each line and gives a non-sense accuracy percentage...
I emailed libSVM's author and I thought it would be good to share with you the answer to my question...
He told me that when you're doing the "actual" prediction, you just put random numbers as labels. It will still print out the "Accuracy=..." statement, which will, of course, be meaningless; the only thing that matters is svm-predict's output file containing the classification results.
See also the following Q&A from libsvm faq:
Q: I don't know class labels of test data. What should I put in the first column of the test file?
A: Any value is ok. In this situation, what you will use is the output file of svm-predict, which gives predicted class labels.
Does svm-predict ALWAYS output the "Accuracy=xx%" line? In my case it does and it looks like it does testing (and not prediction of unknown data).
If I put no value in the beginning of the line, then it parses the first integer (in my case this is the index of the index:value pair) of each line and gives a non-sense accuracy percentage...
Looks like some issue with your svm-predict, which version version are you using ?
Looks like some issue with your svm-predict, which version are you using ?
I'm using version 2.91...
I thought you were asking due to some problems with your model or issues with input file. Glad that you got proper response from libSVM authors.