Entering edit mode
4.9 years ago
kevin.l.yang
•
0
I am trying to create a support vector regression model that assigns a score to a DNA sequence. I am using the kebabs software from bioconductor (https://bioconductor.org/packages/release/bioc/html/kebabs.html). Here is my code:
library(kebabs)
library(Biostrings)
fastas = readDNAStringSet('train.fa')
scores = read.csv('train_scores.csv', sep = '\t', header = FALSE)
specKlin = spectrumKernel(k = 5:7, distWeight = linWeight(sigma = 72))
specKexp = spectrumKernel(k = 5:7, distWeight = expWeight(sigma = 72))
allspecK = c(specKlin, specKexp)
nus = c(.5, .6, .7, .8)
model <- kbsvm(x = fastas, y = scores, kernel = allspecK, pkg = 'e1071', svm = 'nu-svr', nu = nus, showProgress = TRUE)
However, I get this output:
Grid Search Progress:
Kernel_1 Error: cannot allocate vector of size 1557.2 Gb
In addition: Warning message:
grid search without cross validation (cross=0)
My fastas file has more than 400,000 DNA sequences, and the code only works when I use about 10,000 or fewer of the sequences. Even changing the parameters like k or training one model instead of grid searching results in the same error when I use the full fastas file. Anyway of working around this error?