Entering edit mode
                    6.8 years ago
        Ashley
        
    
        ▴
    
    90
    library("Biostrings")
I want to calculate the probability of di-nucleotide AA, TT, AT, and TA in each 2 location.
My DNA sequence is as follows:
DNA.set
  A DNAStringSet instance of length 5
    width seq                                                              names               
[1]    20 TCCGTATTGGAAAGCTCGTC                                             SEQ-1
[2]    20 TTAGACCACTCCGCATGTAG                                             SEQ-2
[3]    20 CTGTGGTACGGCTCAAACGG                                             SEQ-3
[4]    20 CTCCCGCCTATCTCCCTTCT                                             SEQ-4
[5]    20 TCGCCTAGAAAAAGTTTCCT                                             SEQ-5
I want to obtain the result as follows:
AA=0,0,0,0,1/5,2/5,0,1/5,0,0
TT=1/5,0,0,0,0,0,0,1/5,1/5,0
AT=0,0,0,0,0,0,0,0,1/5,0,0
TA=0,0,1/5,1/5,1/5,0,0,0,0,0
Any help would be great appreciate.
dinucleotideFrequency function of biostrings could give those 2mers. than you can take the subset of your desired ones.
Thanks for your reply. But I want to know the frequency or probability of A/T in each position. Not total number. So dinucleotideFrequency maybe isn't suitable for me.
consensusMatrix(dinucleotideFrequency(DNA.set)) ? maybe ?
For our example,
Thanks for your kind help. But I think the column of result should be the length(DNA.seq)/2=10, however, the column is 16. And it didn't show which column represents for AA, AT, TA and TT. I am the newcomer of bioinformatics, could you help me figure it out? Thank you so much. With my best wishes.
I think the number of column is always the 16. For another example,