Question: Sequence kernel SVM kebabs software memory error
0
gravatar for kevin.l.yang
13 days ago by
kevin.l.yang0 wrote:

I am trying to create a support vector regression model that assigns a score to a DNA sequence. I am using the kebabs software from bioconductor (https://bioconductor.org/packages/release/bioc/html/kebabs.html). Here is my code:

library(kebabs)

library(Biostrings)

fastas = readDNAStringSet('train.fa')

scores = read.csv('train_scores.csv', sep = '\t', header = FALSE)

specKlin = spectrumKernel(k = 5:7, distWeight = linWeight(sigma = 72))

specKexp = spectrumKernel(k = 5:7, distWeight = expWeight(sigma = 72))

allspecK = c(specKlin, specKexp)

nus = c(.5, .6, .7, .8)

model <- kbsvm(x = fastas, y = scores, kernel = allspecK, pkg = 'e1071', svm = 'nu-svr', nu = nus, showProgress = TRUE)

However, I get this output:

Grid Search Progress:

Kernel_1  Error: cannot allocate vector of size 1557.2 Gb

In addition: Warning message:
grid search without cross validation (cross=0)

My fastas file has more than 400,000 DNA sequences, and the code only works when I use about 10,000 or fewer of the sequences. Even changing the parameters like k or training one model instead of grid searching results in the same error when I use the full fastas file. Anyway of working around this error?

svm R • 55 views
ADD COMMENTlink modified 13 days ago by zx87547.3k • written 13 days ago by kevin.l.yang0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1091 users visited in the last hour