Sequence kernel SVM kebabs software memory error
0
0
Entering edit mode
6.5 years ago

I am trying to create a support vector regression model that assigns a score to a DNA sequence. I am using the kebabs software from bioconductor (https://bioconductor.org/packages/release/bioc/html/kebabs.html). Here is my code:

library(kebabs)

library(Biostrings)

fastas = readDNAStringSet('train.fa')

scores = read.csv('train_scores.csv', sep = '\t', header = FALSE)

specKlin = spectrumKernel(k = 5:7, distWeight = linWeight(sigma = 72))

specKexp = spectrumKernel(k = 5:7, distWeight = expWeight(sigma = 72))

allspecK = c(specKlin, specKexp)

nus = c(.5, .6, .7, .8)

model <- kbsvm(x = fastas, y = scores, kernel = allspecK, pkg = 'e1071', svm = 'nu-svr', nu = nus, showProgress = TRUE)

However, I get this output:

Grid Search Progress:

Kernel_1  Error: cannot allocate vector of size 1557.2 Gb

In addition: Warning message:
grid search without cross validation (cross=0)

My fastas file has more than 400,000 DNA sequences, and the code only works when I use about 10,000 or fewer of the sequences. Even changing the parameters like k or training one model instead of grid searching results in the same error when I use the full fastas file. Anyway of working around this error?

R svm • 948 views
ADD COMMENT

Login before adding your answer.

Traffic: 3853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6