I would like to use machine learning techniques such as Naive Bayes and SVM in Weka to identify species using DNA Sequence data. The Issue is that I have to convert the DNA sequences into numerical vectors.
MY sequences are like this:
------------------------------------------------G ------------------------------------------GGAGATG ------------------------------------------GGAGATG ------------------------------------------GGAGATG TTATTAATTCGAGCAGAATTAGGAAATCCTGGATCTTTAATTGGTGATG ----------------------------------------------ATG CTATTAATTCGAGCTGAGCTAAGCCAGCCCGGGGCTCTGCTCGGAGATG -----------------------TCAACCTGGGGCCCTACTCGGAGACG ----TAATCCGAGCAGAATTAAGCCAACCTGGCGCCCTACTAGGGGATG CTATTAATTCGAGCTGAGCTAAGCCAGCCTGGGGCTCTGCTCGGAGATG TTATTAATTCGTTTTGAGTTAGGCACTGTTGGAGTTTTATTAG---ATA
How can I do this? Any suggestion of other programs for doing ML with DNA sequences besides Weka?
That program does not work. No manual, nor examples and very user complicated.