Entering edit mode
3.6 years ago
alfonso
•
0
I'm looking for specific hexamers in a set of target sequences DNAStringSet
using Biostrings
I can find hexamer and then subset the original DNAStringSet
and keep only those sequences that DO NOT have a match
hexamer1 <- DNAString("ATTAAA")
ATTAAA <-unlist(vmatchPattern(hexamer1, target_seq))
target_seq_new <- target_seq[!names(target_seq) %in% names(ATTAAA),]
Then I want to start all over again with a new hexamer
hexamer2 <- DNAString("ATCTAA")
ATCTAA <-unlist(vmatchPattern(hexamer2, target_seq_new))
target_seq_new <- target_seq_new[!names(target_list) %in% names(ATCTAA),]
How can I make this a single function that takes a list of hexamers and goes through all of them in a step-wise manner
hexamers <- c("AAGAAA", "AATACA", "AATAGA", "AATATA", "AATGAA", "ACTAAA", "AGTAAA", "CATAAA", "GATAAA", "TATAAA", "TTTAAA")
I would like to have as output a list of hexamers like this one :
$AATAAA
IRanges object with 5966 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
FBgn0037332:TT05 29 34 6
FBgn0011300:TT02 25 30 6
FBgn0011300:TT02 39 44 6
$ATTAAA
IRanges object with 1375 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
FBgn0051619:TT03 42 47 6
FBgn0010352:TT04 17 22 6
FBgn0261822:TT05 10 15 6
$AATATA
IRanges object with 1267 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
FBgn0013272:TT02 42 47 6
FBgn0013272:TT02 42 47 6
FBgn0085391:TT04 11 16 6
Is there a function that I can use to do this recursive match?