Question

Sequence kernel SVM kebabs software memory error

0

Entering edit mode

6.5 years ago

kevin.l.yang • 0

I am trying to create a support vector regression model that assigns a score to a DNA sequence. I am using the kebabs software from bioconductor (https://bioconductor.org/packages/release/bioc/html/kebabs.html). Here is my code:

library(kebabs)

library(Biostrings)

fastas = readDNAStringSet('train.fa')

scores = read.csv('train_scores.csv', sep = '\t', header = FALSE)

specKlin = spectrumKernel(k = 5:7, distWeight = linWeight(sigma = 72))

specKexp = spectrumKernel(k = 5:7, distWeight = expWeight(sigma = 72))

allspecK = c(specKlin, specKexp)

nus = c(.5, .6, .7, .8)

model <- kbsvm(x = fastas, y = scores, kernel = allspecK, pkg = 'e1071', svm = 'nu-svr', nu = nus, showProgress = TRUE)

However, I get this output:

Grid Search Progress:

Kernel_1  Error: cannot allocate vector of size 1557.2 Gb

In addition: Warning message:
grid search without cross validation (cross=0)

My fastas file has more than 400,000 DNA sequences, and the code only works when I use about 10,000 or fewer of the sequences. Even changing the parameters like k or training one model instead of grid searching results in the same error when I use the full fastas file. Anyway of working around this error?

R svm • 948 views

ADD COMMENT • link updated 6.5 years ago by zx8754 12k • written 6.5 years ago by kevin.l.yang • 0