Question

Svm-Predict Input File Format

4

Entering edit mode

14.0 years ago

Panos ★ 1.8k

I'm trying to classify reads using libSVM (tetramer frequencies).

I have a trained model but I can't find what the input file format for svm-predict should be; the sequences that will be of unknown origin shouldn't have a label in the beginning of the vector. If I don't put one, then svm-predict prints out "Classification=..." as if it was doing testing of the model and I think that there should be a way to "tell" svm-predict that you're not doing testing but "actual" prediction...

I'm new to libSVM, so please tell me if I'm wrong at some point...

short classification metagenomics • 14k views

ADD COMMENT • link 14.0 years ago by Panos ★ 1.8k

score 4 · Answer 1 · 2010-05-05

4

Entering edit mode

14.0 years ago

Khader Shameer 18k

LIBSVM contains 3 programs for three specific applications:

svm-train : Use this program for training your data with class labels.
svm-predict : Once you generate the model use svm-predict with feature vectors as input (no class labels required, the svm-predict with use the model and the input feature vectors and predict the class
svm-scale : This is important important to avoid feature bias. This can be used to scale data to a restricted range as preprocessing for SVM training

I understand that you have already created your model and you are having problem with input file format. This should be in sync with the input files that you have used to generate the model. Usually input file will be a text file with the features derived from sequences.

If you are looking for a tutorial on libsvm, the official tutorial and FAQ are the best.

ADD COMMENT • link 14.0 years ago by Khader Shameer 18k

1

Entering edit mode

Does svm-predict ALWAYS output the "Accuracy=xx%" line? In my case it does and it looks like it does testing (and not prediction of unknown data).

If I put no value in the beginning of the line, then it parses the first integer (in my case this is the index of the index:value pair) of each line and gives a non-sense accuracy percentage...

ADD REPLY • link 14.0 years ago by Panos ★ 1.8k

0

Entering edit mode

Looks like some issue with your svm-predict, which version version are you using ?

ADD REPLY • link 14.0 years ago by Khader Shameer 18k

0

Entering edit mode

Looks like some issue with your svm-predict, which version are you using ?

ADD REPLY • link 14.0 years ago by Khader Shameer 18k

0

Entering edit mode

I'm using version 2.91...

ADD REPLY • link 14.0 years ago by Panos ★ 1.8k

0

Entering edit mode

I thought you were asking due to some problems with your model or issues with input file. Glad that you got proper response from libSVM authors.

ADD REPLY • link 14.0 years ago by Khader Shameer 18k

score 2 · Answer 2 · 2010-05-06

I emailed libSVM's author and I thought it would be good to share with you the answer to my question...

He told me that when you're doing the "actual" prediction, you just put random numbers as labels. It will still print out the "Accuracy=..." statement, which will, of course, be meaningless; the only thing that matters is svm-predict's output file containing the classification results.

See also the following Q&A from libsvm faq:

Q: I don't know class labels of test data. What should I put in the first column of the test file?

A: Any value is ok. In this situation, what you will use is the output file of svm-predict, which gives predicted class labels.