I want to build a machine learning classifier using a k-mer approach like RDP (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950982/) using a different machine learning algorithm and building a different classifier (because RDP only goes to genus, but I need to species).
What I can do/have done: download reference sequences (~300.000 á 900bp) > filter > annotate with taxonomic informaiton > k-mer library creation
What I am stuck on: setting up machine learning
Problem: Support vector machines (radial Kernel) and Random Forst learning times are so slow to the point they basically do not finish at all. I have enabled parallel computing, but it seems the paralell part is only applied to the repeated LOOCV, and not to the algorithm itself.
Does anybody have a clue what I can do next? I have access to an running R shell on a cluster that can use the doSNOW package. I also have a pretty fast machine right in front of me.