Question: Sequence kernel SVM kebabs software memory error
0
gravatar for kevin.l.yang
12 months ago by
kevin.l.yang0 wrote:

I am trying to create a support vector regression model that assigns a score to a DNA sequence. I am using the kebabs software from bioconductor (https://bioconductor.org/packages/release/bioc/html/kebabs.html). Here is my code:

library(kebabs)

library(Biostrings)

fastas = readDNAStringSet('train.fa')

scores = read.csv('train_scores.csv', sep = '\t', header = FALSE)

specKlin = spectrumKernel(k = 5:7, distWeight = linWeight(sigma = 72))

specKexp = spectrumKernel(k = 5:7, distWeight = expWeight(sigma = 72))

allspecK = c(specKlin, specKexp)

nus = c(.5, .6, .7, .8)

model <- kbsvm(x = fastas, y = scores, kernel = allspecK, pkg = 'e1071', svm = 'nu-svr', nu = nus, showProgress = TRUE)

However, I get this output:

Grid Search Progress:

Kernel_1  Error: cannot allocate vector of size 1557.2 Gb

In addition: Warning message:
grid search without cross validation (cross=0)

My fastas file has more than 400,000 DNA sequences, and the code only works when I use about 10,000 or fewer of the sequences. Even changing the parameters like k or training one model instead of grid searching results in the same error when I use the full fastas file. Anyway of working around this error?

svm R • 206 views
ADD COMMENTlink modified 12 months ago by zx87549.2k • written 12 months ago by kevin.l.yang0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1350 users visited in the last hour