Question: how to do vmatchPattern() for DNAStringSet [edited]
0
gravatar for catherine12243
5.6 years ago by
United States
catherine12243130 wrote:

I think I'm using the right function,and there is no error message. But I didn't get the result.Here is what I did.

    library(BSgenome)
    bd<-DNAString("CAGGTAG")
    dna <- readDNAStringSet("sequence.txt")
    zld_bindingsite<-vmatchPattern(bd,dna, max.mismatch=0)

And my result is:

    MIndex object of length 5050
    $`dm3_cytoBand_101F1 range=chr4:69669-90134 5'pad=0 3'pad=0 strand=+repeatMasking=none`
    IRanges of length 0
    
    $`dm3_cytoBand_102A1 range=chr4:90135-128922 5'pad=0 3'pad=0 strand=+ repeatMasking=none`
    IRanges of length 0
    
    $`dm3_cytoBand_102A2 range=chr4:128923-130909 5'pad=0 3'pad=0 strand=+ repeatMasking=none`
    IRanges of length 0
    
    ...
    <5047 more elements>

But the DNAStringSet looks fine.

    head(dna)
    A DNAStringSet instance of length 6
    width       seq                                                                                                            names               
    [1] 20466 AATTGTCCTTATTAATTGACGATGCT...ATTATGCGCCTGATTTAAAGTACAAA dm3_cytoBand_101F...
    [2] 38788 ATGTGTAAATATATCACCTTACCGTC...CATGCTAATCACTCGCTTTTAAATAG dm3_cytoBand_102A...
    [3]  1987 CAACTTGGGCCACCTTCATCACCAAT...GACAATCCTGTATATCCTATAACACT dm3_cytoBand_102A...
    [4] 24243 AGAGCCTTATGTAATCCACCTCATTT...TTCACTTGTAATTTTAAATTGTTCAG dm3_cytoBand_102A...
    [5] 38787 AAATGTTCGCAAGTCAGAGCAAGGCT...AGCGGGAGCTTCTAAGACTTTGTTTT dm3_cytoBand_102A...
    [6]  1987 TCGTGTTTTGCTCGTTCACAAAATCC...ATTATTAAAATACTTTCAAAAATAAA dm3_cytoBand_102

But I expect to see the results like this:

test<-BString("AATTGTCCTTATTAATTCAGGTAGGACGATGCTATATTATTAGTCTTTCCCAGGTAGGACATTACCCGCTAGTGAAATGTGATATTCTGAATCTTTCCGCTGTAAGCGCTTTAAGTTTGTTGCAATTA")
> bd<-DNAString("CAGGTAG")
> zld_bindingsite<-matchPattern(bd,test, max.mismatch=0)
> zld_bindingsite
  Views on a 128-letter BString subject
subject: AATTGTCCTTATTAATTCAGGTAGGACGATGCTATAT...TCTTTCCGCTGTAAGCGCTTTAAGTTTGTTGCAATTA
views:
    start end width
[1]    18  24     7 [CAGGTAG]
[2]    51  57     7 [CAGGTAG]

I actually can extract the seq by using toString(), but in this way I can only have one single string. Is there any function that can extract all the sequences from this StringSet??

R • 2.2k views
ADD COMMENTlink modified 5.6 years ago by Mattias Aine570 • written 5.6 years ago by catherine12243130
1
gravatar for Mattias Aine
5.6 years ago by
Mattias Aine570
Sweden
Mattias Aine570 wrote:

Haven't worked much with these functions but I'll give it a try.

Your object "zld_bindingsite" is an MIndex object that contains all the hits in your StringSet (the first 3 sequences don't seem to have a match).

As you are looking for an exact pattern you don't really need the sequences from the string set, only the positions.

unlist(zld_bindingsite) should give you an IRanges object with all positions together with the sequence names the hits are in.

as.matrix(unlist(zld_bindingsite) ) gives you the output in matrix format with sequence IDs as rownames, motif start point in the first column and motif width in the second.

You might then also want to run the reverse complement of your motif in the same way and again extract all hits to get the full list.

ADD COMMENTlink written 5.6 years ago by Mattias Aine570
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1249 users visited in the last hour