R partial string matching with constraints
1
0
Entering edit mode
4.0 years ago
5utr ▴ 370

I would like to find matches between two list of AA strings so that 1 of these 3 rules is matched :

1) query string has to be a substring of subject string

2) query string has to end with the same pattern the subject string starts (any number of characters)

3) query string has to start with the same pattern the subject string end (any number of characters)

Example of what constitute a match or not

r grep pattern matching • 1.2k views
ADD COMMENT
0
Entering edit mode

What have you tried? Right now, it looks like you're asking us to solve your problem, and that is not a worthwhile use of the community's time.

ADD REPLY
1
Entering edit mode
4.0 years ago
5utr ▴ 370

Probably there is a more concise way but this works:

queries=c('GTASQ','QSD','DRARTK','GTASQW')
subject=c('ASQSDRA')

lapply(queries,function(X) 
#peptides have to fall completely into sequence
grepl(X,subject) |
# or start with the sequence end  
any(unlist(lapply(c(nchar(X):1),function(i) 
    grepl(paste(c('^',str_sub(X,-i,-1)),collapse=''),subject) ))) |
# or end with the sequence start  
any(unlist(lapply(c(1:nchar(X)),function(i)
    grepl(paste(c(str_sub(X,1,i),"$"),collapse=''),subject) )))   
)
ADD COMMENT

Login before adding your answer.

Traffic: 3031 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6