Assuming you want to classify DNA/RNA/protein sequence input (otherwise this question should be posted on StackOverflow) the first thing to do is to build your dictionary. The most trivial thing would be to make a k-mer dictionary, e.g. for a DNA sequence and k=4 this would be AAAA, AAAT, AAAG, AAAC, AATA, ..., 256 features in total. If a k-mer #1 (AAAA) is present in your sequence you let the feature 1 equal to 1 (1:1), if not it would be 0 (1:0), and so on. In case you have ambiguous letters, e.g. K (G or T) in AAAK, you can use weights instead of 0/1, so you'll let AAAG:0.5 and AAAT:0.5.