As the title, I have a list of strings (> 10,000). For each in string list, I want to know whether it can match any pattern in my pattern list (>10,000). The way I came up with is.
# CREATE FUNC TO DETECT MATCH FOR EACH STR
any_match <- function(str) {
any(sapply(pattern_list, function(x){str_detect(str, x)}))
}
# SAPPLY EACH ELEMENT IN STRING LIST TO any_match FUNC
sapply(str_list, any_match)
It works, but super slow. Is there any quicker way to do it?
Please give an example of both lists and an example of what you would consider a valid match. Using two
sapply
in such a short command is always suspicious, I am sure we can find a faster solution.pattern_list.txt list string_list.txt
I can't load string_list.txt in R. Complaining of duplicates.
I tried to use separate(sample, c('description','id','Source_Name'), sep="\.") to get the last part of pattern, however, since the sample str (string_list) is messy, it causes warnings. I went to manually solve it. but I am looking for an automatic way to do this.
Ok, this string_list.txt that you provided is super messy, is there a formatting mistake?