Convert regex into DNAString object

1

Entering edit mode

5.2 years ago

elisheva ▴ 120

Hi,
I am trying to search for a pattern in a sequence in a way that a specific nucleotide won't be at the edges.
For example, given the following sequence:

x <- DNAString("TGCTTGCGCA")

I want to extract all the occurrences of GC where there is no T before or after.
Therefore only one occurrence will fit, since there are: TGCT, TGC and finally CGCA which indeed meets the condition.
In other words, the matching pattern is: {T}GC{T}
But I can't find any way to implement it using the Biostrings package.

I really hope you can help me figure it out.
Thanks for your help.

R bioconductor Biostrings • 1.2k views

ADD COMMENT • link 5.2 years ago by elisheva ▴ 120

0

Entering edit mode

What is the problem with just converting the DNAString to a character and doing your regex with that?

ADD REPLY • link 5.2 years ago by benformatics 4.0k

0

Entering edit mode

Because I use StringSet and I want the analysis to be as fast as possible. If I will convert any single interval into character, I guess it will be much slower.