What is the format of input test data in svmlight classifier? How to generate it?
1
0
Entering edit mode
9.5 years ago

I am using svm classifier svmlight

In sample example that take input test file in this format:

4 qid:4 1:1 2:0 3:0 4:0.2 5:1
3 qid:4 1:1 2:1 3:0 4:0.3 5:0
2 qid:4 1:0 2:0 3:0 4:0.2 5:1
1 qid:4 1:0 2:0 3:1 4:0.2 5:0

but generally we classify plain input text, how above format is achieved?

I mean how to convert plaint input text to above specific format?

svmlight classification svm machinelearning • 7.0k views
ADD COMMENT
0
Entering edit mode
9.5 years ago

Assuming you want to classify DNA/RNA/protein sequence input (otherwise this question should be posted on StackOverflow) the first thing to do is to build your dictionary. The most trivial thing would be to make a k-mer dictionary, e.g. for a DNA sequence and k=4 this would be AAAA, AAAT, AAAG, AAAC, AATA, ..., 256 features in total. If a k-mer #1 (AAAA) is present in your sequence you let the feature 1 equal to 1 (1:1), if not it would be 0 (1:0), and so on. In case you have ambiguous letters, e.g. K (G or T) in AAAK, you can use weights instead of 0/1, so you'll let AAAG:0.5 and AAAT:0.5.

ADD COMMENT

Login before adding your answer.

Traffic: 1921 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6