Biostrings: Subsetting Vpatternmatch On Xstringset-Class
Entering edit mode
9.5 years ago
Timtico ▴ 330

I would like to get all the substrings of a pattern match on a XStringSet-class. I now use the following code, but this ignores multiple matches and I have the feeling there is a better way to do it that uses biostrings functions.

I load a fasta file into a XStringSet-class object and then search for a specific string using the vmatchPattern function:

genes <- readDNAStringSet(File = "filename", format = "fasta", use.names = T)
view <- vmatchPattern(pattern = "CCGGA", genes)
matches <- unlist(view, recursive = T, use.names = T)
m <- as.matrix(matches)

I retrieve a substring starting at the match and 20 positions upward:

subseq(genes[rownames(m),], start = m[rownames(m),1], width = 20)

What is a better way to do this that includes all possible matches and using Biostrings functions?

r bioconductor • 2.2k views

Login before adding your answer.

Traffic: 2302 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6